[英]Searching a filtered dataframe for a specific string and then creating a new column based on the results (Python/Pandas)
我正在尝试过滤我的 dataframe(医院)以查找“脑出血”列为真的情况。 然后,我想在 Brain_info 列中搜索特定单词(“cancer”),然后创建一个包含该单词(“cancer”)的新列。
我以前在没有过滤组件的情况下这样做过,但是我在这种情况下遇到了麻烦。
#What I have
| brain bleeding| brain info | |final diagnosis|
|---------------|-------------| ----------------
| True | BlahBlahBlah| I want to add this column | |
| True | Cancer | |Cancer |
| False | Cancer | | |
#Creating an empty column in my dataframe for the final diagnosis.
hospital["final_diagnosis"] = ""
#Filter cases where brain cancer is True
filt = (hospital["brain_bleeding"] == True)
#Search for the filtered cases if the diagnosis contains "cancer" and adds it to the corresponding "final_diagnosis" cell, if it is there. This is where my error is?
hospital.loc[filt, 'brain_info'].str.contains("cancer", case=False, na=False), "final diagnosis"] = "cancer"
有人可以帮我吗? 谢谢
假设您的文件是:
brain_bleeding brain_info
True BlahBlahBlah
True Cancer
False Cancer
您可以尝试以下方法:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd
hospital = pd.read_csv('file.csv', sep='\t')
# add True to final_diagnosis column if brain is bleeding and brain info is cancer
hospital.loc[(hospital['brain_bleeding'] == True) &
(hospital['brain_info'] == 'Cancer'), 'final_diagnosis'] = True
hospital['final_diagnosis'].fillna('', inplace=True) # replace NaN with empty strings
print(hospital)
Output:
brain_bleeding brain_info final_diagnosis
0 True BlahBlahBlah
1 True Cancer True
2 False Cancer
注意:我已经根据您示例中的final_diagnosis
列添加了两个条件 - 看起来您可能只需要一个条件(如果需要,免费提供两个删除一个一个)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.