简体   繁体   English

在 Python 的另一个变量中创建基于 substring 的新变量

[英]Creating new variable based on substring in another variable in Python

I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).我正在尝试根据特定文本字符串(在 Python 中)中的内容创建一个二进制(是/否)变量。

The data looks something like:数据看起来像:

Person ID个人编号 Test Result测试结果
87 87 No exercise induced ischaemia无运动引起的缺血
88 88 Treadmill test induced increased BP跑步机测试导致血压升高
89 89 NORMAL test on treadmill跑步机上的正常测试

and so on.等等。

I need to pick out all the people who have "No exercise induced ischaemia".我需要挑选出所有患有“非运动性缺血”的人。 Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.鉴于我在真实数据集中有大约 20 列和大约 14000 行需要搜索,任何人都可以阐明如何做到这一点。

Here's an example dataframe for convenience为方便起见,这是一个示例 dataframe

d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill",  "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)

I've tried things like我试过像

df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]

with no luck.没有运气。

Thanks for any help!谢谢你的帮助!

You're very close.你很亲密。 Just use np.where to actually generate the yes/no:只需使用np.where来实际生成是/否:

df['NegTest'] = np.where(df.TestResult.str.contains('No exercise induced ischaemia', case=True), 'yes', 'no')

Output: Output:

>>> df
   ID                     TestResult NegTest
0  87  No exercise induced ischaemia     yes
1  88       NORMAL test on treadmill      no
2  89  No exercise induced ischaemia     yes
3  90    treadmill induced ischaemia      no
4  91       NORMAL test on treadmill      no
5  92  No exercise induced ischaemia     yes

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM