簡體   English   中英

如何在兩個條件下在pandas數據框中添加新列?

[英]How to add new column in pandas dataframe with two conditions?

我需要根據熊貓數據框中的條件添加一個新列

輸入文件

Name    C2Mean  C1Mean
a       2        0
b       4        2
c       6        2.5

這些是條件:

if C1Mean = 0; log2FC = log2([C2Mean=2])
if C1Mean > 0; log2FC = log2([C2Mean=4]/[C1Mean=2])
if C1Mean > 0; log2FC = log2([C2Mean=4]/[C1Mean=2])

基於這些條件,我想添加一個新列“ log2FC”,如下所示:

Name    C2Mean  C1Mean  log2FC
a        2        0     1
b        4        2     1
c        6        2.5   1.2630344058

我試過的代碼:

import pandas as pd
import numpy as np
import os

def induced_genes(rsem_exp_data):
    pwd = os.getcwd()
    data = pd.read_csv(rsem_exp_data,header=0,sep="\t")
    data['log2FC'] = [np.log2(data['C2Mean']/data['C1Mean'])\
    if data['C2Mean'] > 0] else np.log2(data['C2Mean'])]
    print(data.head(5))

induced_genes('induced.genes')

您可以使用以下代碼:

df = pd.DataFrame({"Name":["a", "b", "c"], "C2Mean":[2,4,6], "C1Mean":[0, 2, 2.5]})

df.head()

Name    C2Mean  C1Mean
a         2     0.0
b         4     2.0
c         6     2.5

df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 else np.log2(x["C2Mean"]), axis=1)

df.head()

Name    C2Mean  C1Mean  log2FC
a        2      0.0     1.000000
b        4      2.0     1.000000
c        6      2.5     1.263034

在這里axis=1表示您要對所有行執行此操作。

這應該工作,並且比應用更快

import pandas as pd
import numpy as np
df = pd.DataFrame({"Name":["a", "b", "c"], "C2Mean":[2,4,6], "C1Mean":[0, 2, 2.5]})

df["log2FC"] = np.where(df["C1Mean"]==0,
                        np.log2(df["C2Mean"]), 
                        np.log2(df["C2Mean"]/df["C1Mean"]))

更新:時間

N = 10000
df = pd.DataFrame({"C2Mean":np.random.randint(0,10,N), 
                   "C1Mean":np.random.randint(0,10,N)})

%%timeit -n10
a = np.where(df["C1Mean"]==0,
             np.log2(df["C2Mean"]),
             np.log2(df["C2Mean"]/df["C1Mean"]))

1.06 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit -n10
b = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 
                       else np.log2(x["C2Mean"]), axis=1)

248 ms ± 5.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

加速約為233倍。

*更新2:刪除RuntimeWarning

只需在開始時添加

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM