根据字符串部分为 DF 行赋值

Question

I have a dataframe with 1 column (these are headernames for another dataframe. I am trying to assign weightings to these based on strings names contained in the rows. They all have long names (classes and subclasses like) seperated by underscores, for example: email_Trading Only, readership_unique_client, roadshow_NDR_Con_Call_Meetings, forum_meeting,我有一个 dataframe 有 1 列（这些是另一个 dataframe 的标题名称。我试图根据行中包含的字符串名称为这些分配权重。它们都有长名称（类和子类），用下划线分隔，例如： email_Trading Only, readership_unique_client, roadshow_NDR_Con_Call_Meetings, forum_meeting,

I would like to assign weights to these based on string instances that occur before/inbetween/after underscores.我想根据出现在下划线之前/之间/之后的字符串实例为这些分配权重。

Was thinking about creating a dictionary of sorts, but not sure how to loop and iterate through all the rows properly.正在考虑创建各种字典，但不确定如何正确循环和迭代所有行。 Pseudocode here:伪代码在这里：

for i in rows: 
     if i contains 'email' #before first underscore
          then 0.5 #assigned to corresponding row in new column of DF

Sample Data and output (based on first string portion before underscore(:示例数据和 output（基于下划线之前的第一个字符串部分（：

                                TITLES   WEIGHTS     
2                        emp_full_name     0
3                      emp_office_code     0
4              emp_country_office_code     0
..                                 ...
171   forum_presentation_Platinum Plus     0.5
172  forum_presentation_Private Client     0.5
173          forum_presentation_Silver     0.5

Answer 1

See the user guide on how to test for string that contains a pattern .请参阅用户指南，了解如何测试包含模式的字符串。

You can solve it with something like你可以用类似的东西解决它

df['WEIGHTS'] = df.TITLES.str.contains('email') * 0.5

Or create the column and then update it或者创建列然后更新它

df['WEIGHTS'] = 0
df.loc[df.TITLES.str.contains('email'), 'WEIGHTS'] = 0.5

Update更新

.str accessors work with regex by default so you can include optional patterns like .str访问器默认使用正则表达式，因此您可以包含可选模式，例如

df.loc[df.TITLES.str.contains('(email)|(forum)'), 'WEIGHTS'] = 0.5

You can also get the first part of the strings with您还可以获取字符串的第一部分

label = df.TITLES.str.split().str[0]

Then use a mapper with series.replace , but you would need to include all possible suffixes然后使用带有series.replace的映射器，但您需要包含所有可能的后缀

df['WEIGHTS'] = label.replace({'email': 0.5, 'forum': 0.2 ...})

根据字符串部分为 DF 行赋值

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-19 02:51:50

根据字符串部分为 DF 行赋值

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-19 02:51:50

解决方案1
1 已采纳 2020-08-19 02:51:50