简体   繁体   English

如何使用筛选列表中的字符串值在 dataframe 中创建列?

[英]How to create a column in a dataframe using filtering for string values from a list?

I have a dataframe of the following format (Actual dataframe contains more than 10000 rows)我有一个如下格式的dataframe(实际dataframe包含10000多行)

Occupation                  Education
Engineer                    High School    
Neurosurgeon                Masters
Electrical Engineer         Masters
Mechanical Engineer         Masters
Software Engineer           Masters
Engineer                    Masters
Business Executive          Masters
Sales Executive             Bachelors
Neurosurgeon                Masters
Electrical Engineer
Accountant                  Bachelors
Sales Executive             Masters

I want to add a column based on selective filtering我想添加一个基于选择性过滤的列

I need my result to be like this我需要我的结果是这样的

Occupation                  Education               Welfare_Cost
Engineer                    High School             50 
Neurosurgeon                Masters                 50
Electrical Engineer         Masters                 100
Mechanical Engineer         Masters                 100
Software Engineer           Masters                 100
Engineer                    Masters                 100
Business Executive          Masters                 100
Sales Executive             Bachelors               50
Neurosurgeon                Masters                 50
Electrical Engineer                                 50
Accountant                  Bachelors               50 
Sales Executive             Masters                 100

I want to only work on rows where a occupation contains a string from a list and Education is Masters I tried to achieve this using the following code where but kept getting errors.我只想处理职业包含列表中的字符串并且教育是大师的行我尝试使用以下代码来实现这一点,但不断出现错误。


lis=['Engineer','Executive','Teacher']

df['Welfare_Cost']=np.where(((df['Education']=='Masters')&
                        (df['Occupation'].str.contains(i for i in lis))),        
                      100,50)

I know I can also do it by running an iterative loop to create a list for each row and add that list as a column, but I have many list combinations, so I am looking for a way where I can do this without using an interative loop.我知道我也可以通过运行迭代循环来为每一行创建一个列表并将该列表添加为一列来做到这一点,但是我有很多列表组合,所以我正在寻找一种无需使用交互式就可以做到这一点的方法环形。

Use join with \b\b for word boundaries by |通过|使用\b\b join边界for regex or :对于正则表达式or

lis=['Engineer','Executive','Teacher']

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

Or:或者:

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

print (df)
             Occupation  Education  Welfare_Cost
0         Engineer High     School            50
1          Neurosurgeon    Masters            50
2   Electrical Engineer    Masters           100
3   Mechanical Engineer    Masters           100
4     Software Engineer    Masters           100
5              Engineer    Masters           100
6    Business Executive    Masters           100
7       Sales Executive  Bachelors            50
8          Neurosurgeon    Masters            50
9   Electrical Engineer        NaN            50
10           Accountant  Bachelors            50
11      Sales Executive    Masters           100

Difference is possible see in changed data - \b\b match strings without substrings:在更改的数据中可能会有所不同 - \b\b匹配没有子字符串的字符串:

lis=['Engineer','Executive','Teacher']

df['Welfare_Cost1'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost2'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

print (df)
              Occupation  Education  Welfare_Cost1  Welfare_Cost2
0          Engineer High     School             50             50
1           Neurosurgeon    Masters             50             50
2   Electrical Engineers    Masters            100             50
3    Mechanical Engineer    Masters            100            100
4      Software Engineer    Masters            100            100
5               Engineer    Masters            100            100
6     Business Executive    Masters            100            100
7        Sales Executive  Bachelors             50             50
8           Neurosurgeon    Masters             50             50
9    Electrical Engineer        NaN             50             50
10            Accountant  Bachelors             50             50
11      Sales Executives    Masters            100             50

In your case a filter list contains only essential part of occupation name (semantically), so it'd be enough to check for str.endswith :在您的情况下,过滤器列表仅包含职业名称的重要部分(语义上),因此检查str.endswith就足够了:

df['Welfare_Cost']=np.where((df['Education']=='Masters') & df['Occupation'].str.endswith(tuple(lis)),100,50)

             Occupation    Education  Welfare_Cost
0              Engineer  High School            50
1          Neurosurgeon      Masters            50
2   Electrical Engineer      Masters           100
3   Mechanical Engineer      Masters           100
4     Software Engineer      Masters           100
5              Engineer      Masters           100
6    Business Executive      Masters           100
7       Sales Executive    Bachelors            50
8          Neurosurgeon      Masters            50
9   Electrical Engineer         None            50
10           Accountant    Bachelors            50
11      Sales Executive      Masters           100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用数据框列中的唯一值创建列表列表? - How to create a list of lists using unique values in a dataframe column? 如何从 dataframe 列创建随机值列表? - How to create a list of random values from a dataframe column? 如何从列表字符串的列中提取值 pandas dataframe - How to extract values from a column of list string pandas dataframe 如何根据 pandas dataframe 中另一列的多个值在一列中创建值列表? - How do I create a list of values in a column from several values from another column in a pandas dataframe? 如何使用 spaCy 从数据框列创建标记化单词列表? - How to create a list of tokenized words from dataframe column using spaCy? 如何在 pandas Dataframe 中使用具有列值的行来匹配行和过滤 - How to match rows and filtering using rows with column values in pandas Dataframe 使用字典为列值过滤pandas数据帧 - Filtering pandas dataframe using dictionary for column values 如何从字典列表中的特定值在单独的 Dataframe 列中创建列表? - How do I create a list in a separate Dataframe column from specific values from within a list of dictionaries? 熊猫:从数据框中过滤数字作为字符串值 - Pandas: Filtering number as string values from dataframe 根据列表中的值过滤数据框 - filtering a dataframe on values in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM