![](/img/trans.png)
[英]Python add new column values based on multiple conditions in another dataframe
[英]Populating the values of a column based on multiple conditions to a new column of a dataframe
假設我有以下數據框,
df.head()
col1 col2 col3 start end gs
chr1 HAS GEN 11869 14409 DDX
chr1 HAS TRANS 11869 14409 NaN
chr1 HAS EX 11869 12227 NaN
chr1 HAS GEN 12613 12721 FXBZ
chr1 HAS EX 13221 14409 NaN
chr1 HAS EX 12010 12057 NaN
現在,我需要根據兩個條件添加一個新列,並且必須從一列中使用值。
例如,條件是。
col3
等於GEN
或EX
。 然后使用列gs
中的值添加一個新列col7
。gs
的值必須始終是col3
等於GEN
時的值。 那絕不是NaNs
。最后,我的目標是讓我的數據框如下,
col1 col2 col3 start end gs col7
chr1 HAS GEN 11869 14409 DDX DDX
chr1 HAS EX 11869 12227 NaN DDX
chr1 HAS TRANS 11869 14409 no
chr1 HAS GEN 12613 12721 FXBZ FXBZ
chr1 HAS EX 13221 14409 NaN FXBZ
chr1 HAS EX 12010 12057 NaN FXBZ
我嘗試使用lambda
:
df.apply(
lambda row: row['gs'] if (row['col3'] =="EX" and row['gs'] !=NaN) else "no",
axis=1)
但是,我無法將gs
列中的值填充到新列中。 它設置NaN
值。 這是我不想要的。
任何建議都非常感謝!
我相信您可以使用numpy.where
條件為Series.isin
並在gs
列中向前填充缺失值:
df['col7'] = np.where(df['col3'].isin(['GEN','EX']), df['gs'].ffill(), 'no')
print (df)
col1 col2 col3 start end gs col7
0 chr1 HAS GEN 11869 14409 DDX DDX
1 chr1 HAS EX 11869 14409 NaN DDX
2 chr1 HAS TRANS 11869 12227 NaN no
3 chr1 HAS GEN 12613 12721 FXBZ FXBZ
4 chr1 HAS EX 13221 14409 NaN FXBZ
5 chr1 HAS EX 12010 12057 NaN FXBZ
詳情:
print (df['gs'].ffill())
0 DDX
1 DDX
2 DDX
3 FXBZ
4 FXBZ
5 FXBZ
Name: gs, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.