利用biomaRt从实现了BioMart组件的数据库中提取数据
安装并加载biomaRt软件包。
biomaRt软件包的地址为:http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html。
在https可用的情况下,使用以下命令安装:
## try http if https is not available
source("https://bioconductor.org/biocLite.R")
biocLite("biomaRt")
加载软件包:
library("biomaRt")
几个常用的函数
・ listMarts(mart = NULL, host = "www.biomart.org", path = "/biomart/martservice", port = 80, includeHosts = FALSE, archive = FALSE, ssl.verifypeer = TRUE, verbose = FALSE):获取可用的BioMart网络服务(网络数据库)。
・ useMart(biomart, dataset, host = "www.biomart.org", path = "/biomart/martservice", port = 80, archive = FALSE, ssl.verifypeer = TRUE, version, verbose = FALSE):建立与将要使用的数据库(biomart)之间的连接。
・ listDatasets(mart, verbose = FALSE):显示数据库(mart)中可用的数据集。
・ useDataset (dataset, mart, verbose = FALSE):选定数据库(mart)中的数据集(dataset)。
・ listFilters(mart, what = c("name", "description")):显示数据集(mart)所支持的过滤条件;
・ listAttributes(mart, page, what = c("name", "description")):显示数据集(mart)所支持的属性;
・ getBM(attributes, filters = "", values = "", mart, curl = NULL, checkFilters = TRUE, verbose = FALSE, uniqueRows = TRUE, bmHeader = FALSE):查询函数。返回数据集(mart)中,过滤条件(filter)的值为values的数据的attributes属性。其中filters和values是一一对应的。
常用的几个参数的解释:以"提取EntrezGene identifiers为673、837基因的的GO annotation"为例,则Attributes是指c('entrezgene', 'go_id'),Filters是指' entrezgene',values是指c("673", "837")。
・ Attributes (attributes):一般是指我们希望从数据库中提取的信息。(A vector of attributes that we want to retrieve)
・ Filters (filters):过滤条件。
基本流程
获取数据库列表
mart_list <- listMarts();
返回结果(举例):
biomart | version | |
1 | ensembl | ENSEMBL GENES 81 (SANGER UK) |
2 | snp | ENSEMBL VARIATION 81 (SANGER UK) |
3 | regulation | ENSEMBL REGULATION 81 (SANGER UK) |
4 | vega | VEGA 61 (SANGER UK) |
5 | fungi_mart_28 | ENSEMBL FUNGI 28 (EBI UK) |
6 | fungi_variations_28 | ENSEMBL FUNGI VARIATION 28 (EBI UK) |
选定所需要使用的数据库:
ensembl = useMart("ensembl")
获取所选定数据库的的数据集列表
dataset_lists <- listDatasets(ensembl)
返回结果如下:
dataset | description version | |
1 | oanatinus_gene_ensembl | Ornithorhynchus anatinus genes (OANA5) OANA5 |
2 | cporcellus_gene_ensembl | Cavia porcellus genes (cavPor3) cavPor3 |
3 | gaculeatus_gene_ensembl | Gasterosteus aculeatus genes (BROADS1) BROADS1 |
4 | lafricana_gene_ensembl | Loxodonta africana genes (loxAfr3) loxAfr3 |
5 | itridecemlineatus_gene_ensembl | Ictidomys tridecemlineatus genes (spetri2) spetri2 |
6 | choffmanni_gene_ensembl | Choloepus hoffmanni genes (choHof1) choHof1 |
选定所要使用的数据集
dataset_used <- useDataset("hsapiens_gene_ensembl",mart=ensembl)
其中"hsapiens_gene_ensembl"是由listDatasets(ensembl)返回的。
获取数据集所支持的Filters和Attributes。
filter_list <- listFilters(dataset_used)
attributes_list <- listAttributes(dataset_used)
返回结果分别如下:
filter_list:
name | description | |
1 | chromosome_name | Chromosome name |
2 | start | Gene Start (bp) |
3 | end | Gene End (bp) |
4 | band_start | Band Start |
5 | band_end | Band End |
6 | marker_start | Marker Start |
attributes_list:
name | description | |
1 | ensembl_gene_id | Ensembl Gene ID |
2 | ensembl_transcript_id | Ensembl Transcript ID |
3 | ensembl_peptide_id | Ensembl Protein ID |
4 | ensembl_exon_id | Ensembl Exon ID |
5 | description | Description |
6 | chromosome_name | Chromosome Name |
根据需要使用getBM获取所需结果
Entrez <- c("673","837")
goids <- getBM(attributes=c('entrezgene','go_id'), filters='entrezgene', values=entrez, mart=ensembl)
head(goids)
即可!
BioMart简介
BioMart software suit:http://www.biomart.org/。BioMart is a community-driven project to provide unified access to distributed research data to facilitate the scientific discovery process.