[英]Modify r object with rpy2
我正在嘗試使用rpy2
在 python 中使用DESeq2
R/Bioconductor 包。
我在寫我的問題時實際上解決了我的問題(使用do_slots
允許訪問 r 對象屬性),但我認為這個例子可能對其他人有用,所以這里是我在 R 中的做法以及它在 python 中的轉換方式:
我可以從兩個數據幀創建一個“DESeqDataSet”,如下所示:
counts_data <- read.table("long/path/to/file",
header=TRUE, row.names="gene")
head(counts_data)
## WT_RT_1 WT_RT_2 prg1_RT_1 prg1_RT_2
## aap-1 406 311 41 95
## aat-1 5 8 2 0
## aat-2 1 1 0 0
## aat-3 13 12 0 1
## aat-4 6 6 2 3
## aat-5 3 1 1 0
col_data <- DataFrame(lib = c("WT", "WT", "prg1", "prg1"),
treat = c("RT", "RT", "RT", "RT"),
rep = c("1", "2", "1", "2"),
row.names = colnames(counts_data))
head(col_data)
## DataFrame with 4 rows and 3 columns
## lib treat rep
## <character> <character> <character>
## WT_RT_1 WT RT 1
## WT_RT_2 WT RT 2
## prg1_RT_1 prg1 RT 1
## prg1_RT_2 prg1 RT 2
dds <- DESeqDataSetFromMatrix(countData = counts_data,
colData = col_data,
design = ~ lib)
## Warning message:
## In DESeqDataSet(se, design = design, ignoreRank) :
## some variables in design formula are characters, converting to factors
dds
## class: DESeqDataSet
## dim: 18541 4
## metadata(1): version
## assays(1): counts
## rownames(18541): aap-1 aat-1 ... WBGene00255550 WBGene00255553
## rowData names(0):
## colnames(4): WT_RT_1 WT_RT_2 prg1_RT_1 prg1_RT_2
## colData names(3): lib treat rep
為了確保分析將使用正確的控制,我需要重新relevel
一個可以使用“雙括號”語法訪問的因素:
dds[["lib"]]
## [1] WT WT prg1 prg1
## Levels: prg1 WT
dds[["lib"]] <- relevel(dds[["lib"]], ref="WT")
dds[["lib"]]
## [1] WT WT prg1 prg1
## Levels: WT prg1
然后我可以運行分析:
dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
我查看給定基因的結果:
res["his-10",]
## log2 fold change (MAP): lib prg1 vs WT
## Wald test p-value: lib prg1 vs WT
## DataFrame with 1 row and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## his-10 586.5464 3.136174 0.2956132 10.60904 2.705026e-26 8.78785e-25
現在,我想在 python 中用rpy2
做同樣的rpy2
。
我似乎成功地從熊貓數據幀創建了對象:
import pandas as pd
from rpy2.robjects import r, pandas2ri, Formula
as_df = r("as.data.frame")
from rpy2.robjects.packages import importr
deseq2 = importr("DESeq2")
counts_data = pd.read_table("long/path/to/file", index_col=0)
col_data = pd.DataFrame({
"cond_names" : counts_data.columns,
"lib" : ["WT", "WT", "prg1", "prg1"],
"rep" : ["1", "1", "2", "2"],
"treat" : ["RT", "RT", "RT", "RT"]})
col_data.set_index("cond_names", inplace=True)
pandas2ri.activate() # makes some conversions automatic
dds = deseq2.DESeqDataSetFromMatrix(
countData=counts_data,
colData=col_data,
design=Formula("~lib"))
在 IPython(我實際運行之前的命令的地方)中,我可以使用do_slots
查看對象內部,以嘗試識別需要重新調平的因素:
In [229]: tuple(dds.do_slot("colData").slotnames())
Out[229]: ('rownames', 'nrows', 'listData', 'elementType', 'elementMetadata', 'metadata')
In [230]: dds.do_slot("colData").do_slot("listData")
Out[230]:
R object with classes: ('list',) mapped to:
<ListVector - Python:0x7f2ae2590a08 / R:0x108fcdd0>
[FactorVector, FactorVector, FactorVector]
lib: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae20f1c08 / R:0x136a3920>
[ 2, 2, 1, 1]
rep: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600c948 / R:0x136a30f0>
[ 1, 1, 2, 2]
treat: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600ccc8 / R:0x136a3588>
[ 1, 1, 1, 1]
我想重新relevel
的因素是第一個,因為“lib”是傳遞給deseq2.DESeqDataSetFromMatrix
函數的col_data
數據幀中的第一列(編輯:我意識到“lib”實際上是寫在 r 對象的描述中)。
該relevel
上通過訪問屬性do_slots
似乎有效果:
In [231]: dds.do_slot("colData").do_slot("listData")[0] = r.relevel(dds.do_slot("colData").do_slot("listData")[0], ref="WT")
In [232]: dds.do_slot("colData").do_slot("listData")
Out[232]:
R object with classes: ('list',) mapped to:
<ListVector - Python:0x7f2a95078508 / R:0x108fcdd0>
[FactorVector, FactorVector, FactorVector]
lib: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600bb88 / R:0x12a7ff60>
[ 1, 1, 2, 2]
rep: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae2568888 / R:0x136a30f0>
[ 1, 1, 2, 2]
treat: <class 'rpy2.robjects.vectors.FactorVector'>
R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae2568848 / R:0x136a3588>
[ 1, 1, 1, 1]
然后我運行分析部分:
In [233]: dds = deseq2.DESeq(dds)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: estimating size factors
warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: estimating dispersions
warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: gene-wise dispersion estimates
warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: mean-dispersion relationship
warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: final dispersion estimates
warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: fitting model and testing
warnings.warn(x, RRuntimeWarning)
In [234]: res = pandas2ri.ri2py(as_df(deseq2.results(dds)))
In [235]: res.index.names = ["gene"]
dds = deseq2.DESeq(dds)
res = pandas2ri.ri2py(as_df(deseq2.results(dds)))
res.index.names = ["gene"]
現在,檢查測試基因的結果:
In [236]: res.loc["his-10"]
Out[236]:
baseMean 5.865464e+02
log2FoldChange 3.136174e+00
lfcSE 2.956132e-01
stat 1.060904e+01
pvalue 2.705026e-26
padj 8.787850e-25
Name: his-10, dtype: float64
python返回的結果和R一樣。
我在rpy2
文檔中找到了幫助我解決問題的代碼示例: http : rpy2
。
可以通過do_slots
方法訪問 r 個對象的屬性,該方法將屬性名稱作為參數。 請參閱問題以獲取完整解決方案。
編輯:
還有一個do_slot_assign
方法可用於例如更改設計公式:
>>> dds.do_slot("design").r_repr()
'~lib'
>>> dds.do_slot_assign("design", Formula("~ treat"))
>>> dds.do_slot("design").r_repr()
'~treat'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.