如何使用筛选列表中的字符串值在 dataframe 中创建列？

Question

I have a dataframe of the following format (Actual dataframe contains more than 10000 rows)我有一个如下格式的dataframe（实际dataframe包含10000多行）

Occupation                  Education
Engineer                    High School    
Neurosurgeon                Masters
Electrical Engineer         Masters
Mechanical Engineer         Masters
Software Engineer           Masters
Engineer                    Masters
Business Executive          Masters
Sales Executive             Bachelors
Neurosurgeon                Masters
Electrical Engineer
Accountant                  Bachelors
Sales Executive             Masters

I want to add a column based on selective filtering我想添加一个基于选择性过滤的列

I need my result to be like this我需要我的结果是这样的

Occupation                  Education               Welfare_Cost
Engineer                    High School             50 
Neurosurgeon                Masters                 50
Electrical Engineer         Masters                 100
Mechanical Engineer         Masters                 100
Software Engineer           Masters                 100
Engineer                    Masters                 100
Business Executive          Masters                 100
Sales Executive             Bachelors               50
Neurosurgeon                Masters                 50
Electrical Engineer                                 50
Accountant                  Bachelors               50 
Sales Executive             Masters                 100

I want to only work on rows where a occupation contains a string from a list and Education is Masters I tried to achieve this using the following code where but kept getting errors.我只想处理职业包含列表中的字符串并且教育是大师的行我尝试使用以下代码来实现这一点，但不断出现错误。


lis=['Engineer','Executive','Teacher']

df['Welfare_Cost']=np.where(((df['Education']=='Masters')&
                        (df['Occupation'].str.contains(i for i in lis))),        
                      100,50)

I know I can also do it by running an iterative loop to create a list for each row and add that list as a column, but I have many list combinations, so I am looking for a way where I can do this without using an interative loop.我知道我也可以通过运行迭代循环来为每一行创建一个列表并将该列表添加为一列来做到这一点，但是我有很多列表组合，所以我正在寻找一种无需使用交互式就可以做到这一点的方法环形。

Answer 1

Use join with \b\b for word boundaries by |通过|使用\b\b join边界for regex or :对于正则表达式or ：

lis=['Engineer','Executive','Teacher']

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

Or:或者：

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

print (df)
             Occupation  Education  Welfare_Cost
0         Engineer High     School            50
1          Neurosurgeon    Masters            50
2   Electrical Engineer    Masters           100
3   Mechanical Engineer    Masters           100
4     Software Engineer    Masters           100
5              Engineer    Masters           100
6    Business Executive    Masters           100
7       Sales Executive  Bachelors            50
8          Neurosurgeon    Masters            50
9   Electrical Engineer        NaN            50
10           Accountant  Bachelors            50
11      Sales Executive    Masters           100

Difference is possible see in changed data - \b\b match strings without substrings:在更改的数据中可能会有所不同 - \b\b匹配没有子字符串的字符串：

lis=['Engineer','Executive','Teacher']

df['Welfare_Cost1'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost2'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

print (df)
              Occupation  Education  Welfare_Cost1  Welfare_Cost2
0          Engineer High     School             50             50
1           Neurosurgeon    Masters             50             50
2   Electrical Engineers    Masters            100             50
3    Mechanical Engineer    Masters            100            100
4      Software Engineer    Masters            100            100
5               Engineer    Masters            100            100
6     Business Executive    Masters            100            100
7        Sales Executive  Bachelors             50             50
8           Neurosurgeon    Masters             50             50
9    Electrical Engineer        NaN             50             50
10            Accountant  Bachelors             50             50
11      Sales Executives    Masters            100             50

Answer 2

In your case a filter list contains only essential part of occupation name (semantically), so it'd be enough to check for str.endswith :在您的情况下，过滤器列表仅包含职业名称的重要部分（语义上），因此检查str.endswith就足够了：

df['Welfare_Cost']=np.where((df['Education']=='Masters') & df['Occupation'].str.endswith(tuple(lis)),100,50)

             Occupation    Education  Welfare_Cost
0              Engineer  High School            50
1          Neurosurgeon      Masters            50
2   Electrical Engineer      Masters           100
3   Mechanical Engineer      Masters           100
4     Software Engineer      Masters           100
5              Engineer      Masters           100
6    Business Executive      Masters           100
7       Sales Executive    Bachelors            50
8          Neurosurgeon      Masters            50
9   Electrical Engineer         None            50
10           Accountant    Bachelors            50
11      Sales Executive      Masters           100

如何使用筛选列表中的字符串值在 dataframe 中创建列？

问题描述

2 个解决方案

解决方案1
0 已采纳 2023-01-31 06:29:45

解决方案2
0 2023-01-31 06:45:30

如何使用筛选列表中的字符串值在 dataframe 中创建列？

问题描述

2 个解决方案

解决方案1 0 已采纳 2023-01-31 06:29:45

解决方案2 0 2023-01-31 06:45:30

解决方案1
0 已采纳 2023-01-31 06:29:45

解决方案2
0 2023-01-31 06:45:30