[英]How to create a column in a dataframe using filtering for string values from a list?
I have a dataframe of the following format (Actual dataframe contains more than 10000 rows)我有一个如下格式的dataframe(实际dataframe包含10000多行)
Occupation Education
Engineer High School
Neurosurgeon Masters
Electrical Engineer Masters
Mechanical Engineer Masters
Software Engineer Masters
Engineer Masters
Business Executive Masters
Sales Executive Bachelors
Neurosurgeon Masters
Electrical Engineer
Accountant Bachelors
Sales Executive Masters
I want to add a column based on selective filtering我想添加一个基于选择性过滤的列
I need my result to be like this我需要我的结果是这样的
Occupation Education Welfare_Cost
Engineer High School 50
Neurosurgeon Masters 50
Electrical Engineer Masters 100
Mechanical Engineer Masters 100
Software Engineer Masters 100
Engineer Masters 100
Business Executive Masters 100
Sales Executive Bachelors 50
Neurosurgeon Masters 50
Electrical Engineer 50
Accountant Bachelors 50
Sales Executive Masters 100
I want to only work on rows where a occupation contains a string from a list and Education is Masters I tried to achieve this using the following code where but kept getting errors.我只想处理职业包含列表中的字符串并且教育是大师的行我尝试使用以下代码来实现这一点,但不断出现错误。
lis=['Engineer','Executive','Teacher']
df['Welfare_Cost']=np.where(((df['Education']=='Masters')&
(df['Occupation'].str.contains(i for i in lis))),
100,50)
I know I can also do it by running an iterative loop to create a list for each row and add that list as a column, but I have many list combinations, so I am looking for a way where I can do this without using an interative loop.我知道我也可以通过运行迭代循环来为每一行创建一个列表并将该列表添加为一列来做到这一点,但是我有很多列表组合,所以我正在寻找一种无需使用交互式就可以做到这一点的方法环形。
Use join
with \b\b
for word boundaries by |
通过
|
使用\b\b
join
边界for regex or
:对于正则表达式
or
:
lis=['Engineer','Executive','Teacher']
pat = '|'.join(r"\b{}\b".format(x) for x in lis)
df['Welfare_Cost'] = np.where(((df['Education']=='Masters') &
(df['Occupation'].str.contains(pat))),
100,50)
Or:或者:
df['Welfare_Cost'] = np.where(((df['Education']=='Masters') &
(df['Occupation'].str.contains('|'.join(lis)))),
100,50)
print (df)
Occupation Education Welfare_Cost
0 Engineer High School 50
1 Neurosurgeon Masters 50
2 Electrical Engineer Masters 100
3 Mechanical Engineer Masters 100
4 Software Engineer Masters 100
5 Engineer Masters 100
6 Business Executive Masters 100
7 Sales Executive Bachelors 50
8 Neurosurgeon Masters 50
9 Electrical Engineer NaN 50
10 Accountant Bachelors 50
11 Sales Executive Masters 100
Difference is possible see in changed data - \b\b
match strings without substrings:在更改的数据中可能会有所不同 -
\b\b
匹配没有子字符串的字符串:
lis=['Engineer','Executive','Teacher']
df['Welfare_Cost1'] = np.where(((df['Education']=='Masters') &
(df['Occupation'].str.contains('|'.join(lis)))),
100,50)
pat = '|'.join(r"\b{}\b".format(x) for x in lis)
df['Welfare_Cost2'] = np.where(((df['Education']=='Masters') &
(df['Occupation'].str.contains(pat))),
100,50)
print (df)
Occupation Education Welfare_Cost1 Welfare_Cost2
0 Engineer High School 50 50
1 Neurosurgeon Masters 50 50
2 Electrical Engineers Masters 100 50
3 Mechanical Engineer Masters 100 100
4 Software Engineer Masters 100 100
5 Engineer Masters 100 100
6 Business Executive Masters 100 100
7 Sales Executive Bachelors 50 50
8 Neurosurgeon Masters 50 50
9 Electrical Engineer NaN 50 50
10 Accountant Bachelors 50 50
11 Sales Executives Masters 100 50
In your case a filter list contains only essential part of occupation name (semantically), so it'd be enough to check for str.endswith
:在您的情况下,过滤器列表仅包含职业名称的重要部分(语义上),因此检查
str.endswith
就足够了:
df['Welfare_Cost']=np.where((df['Education']=='Masters') & df['Occupation'].str.endswith(tuple(lis)),100,50)
Occupation Education Welfare_Cost
0 Engineer High School 50
1 Neurosurgeon Masters 50
2 Electrical Engineer Masters 100
3 Mechanical Engineer Masters 100
4 Software Engineer Masters 100
5 Engineer Masters 100
6 Business Executive Masters 100
7 Sales Executive Bachelors 50
8 Neurosurgeon Masters 50
9 Electrical Engineer None 50
10 Accountant Bachelors 50
11 Sales Executive Masters 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.