在 Python 的另一个变量中创建基于 substring 的新变量

Question

I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).我正在尝试根据特定文本字符串（在 Python 中）中的内容创建一个二进制（是/否）变量。

The data looks something like:数据看起来像：

Person ID个人编号	Test Result测试结果
87 87	No exercise induced ischaemia无运动引起的缺血
88 88	Treadmill test induced increased BP跑步机测试导致血压升高
89 89	NORMAL test on treadmill跑步机上的正常测试

and so on.等等。

I need to pick out all the people who have "No exercise induced ischaemia".我需要挑选出所有患有“非运动性缺血”的人。 Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.鉴于我在真实数据集中有大约 20 列和大约 14000 行需要搜索，任何人都可以阐明如何做到这一点。

Here's an example dataframe for convenience为方便起见，这是一个示例 dataframe

d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill",  "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)

I've tried things like我试过像

df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]

with no luck.没有运气。

Thanks for any help!谢谢你的帮助！

Answer 1

You're very close.你很亲密。 Just use np.where to actually generate the yes/no:只需使用np.where来实际生成是/否：

df['NegTest'] = np.where(df.TestResult.str.contains('No exercise induced ischaemia', case=True), 'yes', 'no')

Output: Output：

>>> df
   ID                     TestResult NegTest
0  87  No exercise induced ischaemia     yes
1  88       NORMAL test on treadmill      no
2  89  No exercise induced ischaemia     yes
3  90    treadmill induced ischaemia      no
4  91       NORMAL test on treadmill      no
5  92  No exercise induced ischaemia     yes

在 Python 的另一个变量中创建基于 substring 的新变量

问题描述

1 个解决方案

解决方案1
1 2022-02-02 00:32:28

在 Python 的另一个变量中创建基于 substring 的新变量

问题描述

1 个解决方案

解决方案1 1 2022-02-02 00:32:28

解决方案1
1 2022-02-02 00:32:28