[英]Creating new variable based on substring in another variable in Python
I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).我正在尝试根据特定文本字符串(在 Python 中)中的内容创建一个二进制(是/否)变量。
The data looks something like:数据看起来像:
Person ID![]() |
Test Result![]() |
---|---|
87 ![]() |
No exercise induced ischaemia![]() |
88 ![]() |
Treadmill test induced increased BP![]() |
89 ![]() |
NORMAL test on treadmill![]() |
and so on.等等。
I need to pick out all the people who have "No exercise induced ischaemia".我需要挑选出所有患有“非运动性缺血”的人。 Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.
鉴于我在真实数据集中有大约 20 列和大约 14000 行需要搜索,任何人都可以阐明如何做到这一点。
Here's an example dataframe for convenience为方便起见,这是一个示例 dataframe
d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)
I've tried things like我试过像
df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]
with no luck.没有运气。
Thanks for any help!谢谢你的帮助!
You're very close.你很亲密。 Just use
np.where
to actually generate the yes/no:只需使用
np.where
来实际生成是/否:
df['NegTest'] = np.where(df.TestResult.str.contains('No exercise induced ischaemia', case=True), 'yes', 'no')
Output: Output:
>>> df
ID TestResult NegTest
0 87 No exercise induced ischaemia yes
1 88 NORMAL test on treadmill no
2 89 No exercise induced ischaemia yes
3 90 treadmill induced ischaemia no
4 91 NORMAL test on treadmill no
5 92 No exercise induced ischaemia yes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.