简体   繁体   English

如果另一列中的字符串包含列表中的内容,则更新一列中的值

[英]Update Value in one column, if string in other column contains something in list

  id name             gender
0 13 John Smith       0
1 46 Jim Jeffries     2
2 75 Jennifer Johnson 0
3 37 Sam Adams        0
4 24 John Cleese      0
5 17 Taika Waititi    0

I have a lot of people's names and genders in a df, taken from a film actors' db. 我有很多人的名字和性别,取自电影演员数据库。 Genders were assigned a 1 (female), 2 (male), or 0 (not listed). 为性别分配了1(女性),2(男性)或0(未列出)。 I'd like to comb through and callously assume genders by name. 我想梳理一下,并按名字冷酷地假设性别。 Names would be stored in a list, and filled out manually. 名称将存储在列表中,并手动填写。 Perhaps I spot somebody with a gender-nonspecific name by ID and find out myself if they are male/female, I'd like to inject that as well: 也许我通过ID发现了一个性别不明的人,然后发现自己是男是女,我也想注入这个名字:

m_names = ['John', ...]
f_names = ['Jennifer', ...]
m_ids   = ['37', ...]
f_ids   = ['', ...]

I've got fine control of for loops and np.where, but I can't figure out how to get through this df, row by row. 我已经很好地控制了for循环和np.where,但我不知道如何逐行通过此df。

If what's above were to be used, what I want to return would look like: 如果要使用上面的内容,我想返回的内容将如下所示:

for index, row in df.iterrows():
  if row['gender'] == 0:
    if   row['name'].str.contains(' |'.join(f_names)) or row['id'].str.contains('|'.join(f_ids)):
      return 1
    elif row['name'].str.contains(' |'.join(m_names)) or row['id'].str.contains('|'.join(m_ids)):
      return 2
print(df)

  id name             gender
0 13 John Smith       2
1 46 Jim Jeffries     2
2 75 Jennifer Johnson 1
3 37 Sam Adams        2
4 24 John Cleese      2
5 17 Taika Waititi    0

Note the space before '|' 注意“ |”之前的空格 in the condition for names, to avoid grabbing any parts of last names. 在使用名称的条件下,避免抓住姓氏的任何部分。

At this point, I'm running into a wall with how I've formatted my if statements. 在这一点上,我对格式化if语句的方式遇到了困惑。 Python doesn't like my formatting, and says my 'return's are 'outside function'. Python不喜欢我的格式,并说我的“返回”是“外部函数”。 If I change these to 如果我将其更改为

row['gender'] = #

I run into issues with unicode and my usage of 'str' and 'contains'. 我遇到了unicode以及“ str”和“ contains”用法的问题。

Seems like you need np.select and no for loops 似乎您需要np.select并且没有for循环

df['gender'] = np.select([df.name.str.contains(" |".join(m_names)),
                          df.name.str.contains(" |".join(f_names))],
                         [2, 1], 
                         default=3)

You could use the Pandas function isin 您可以使用熊猫功能isin

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html

df.loc[df.name.isin(m_names), 'gender'] = 2

You can first construct and combine Boolean masks. 您可以首先构造和组合布尔掩码。 For example: 例如:

m_zero = df['gender'].eq(0)

m_name_female = df['name'].str.contains(' |'.join(f_names))
m_name_male = df['name'].str.contains(' |'.join(m_names))

m_id_female = df['id'].str.contains('|'.join(f_ids))
m_id_male = df['id'].str.contains('|'.join(m_ids))

female_mask = m_zero & (m_name_female | m_id_female)
male_mask = m_zero & (m_name_male | m_id_male)

Then apply logic via pd.DataFrame.loc : 然后通过pd.DataFrame.loc应用逻辑:

df.loc[female_mask, 'gender'] = 1
df.loc[male_mask, 'gender'] = 2

Or use nested numpy.where : 或者使用嵌套的numpy.where

df['gender'] = np.where(female_mask, 1, np.where(male_mask, 2, df['gender']))

Or, if you wish to supply a scalar default value, use numpy.select : 或者,如果您希望提供标量默认值,请使用numpy.select

df['gender'] = np.select([female_mask, male_mask], [1, 2], 3)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果其他列包含字符串,则基于定义的列表创建列 - Create column based on a defined list if other column contains string 如果列在列表中包含字符串,则添加包含列表值的列 - Add column containing list value if column contains string in list 根据另一列中的其他值列表替换一列中的字符串值 - Replacing a string value in one column based upon a list of other values in another column Python:从另一列的列表中替换一列中的字符串 - Python: Replace string in one column from list in other column 如果列包含字符串,则返回列表 - return a list if the column contains a string 如何测试字符串是否包含存储在熊猫列表列中的子字符串之一? - How to test if a string contains one of the substrings stored in a list column in pandas? 如果A列包含某些特定的字符串或A列中句子之外的单词集,则如何更新B列的值 - How to update value of column B if column A contains some specific string or set of words out of sentence in column A 在检查列值是否包含作为列表中元素的字符串后,如何将列表中的元素分配给数据框列? (Python) - How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python) 如果值包含字符串,则设置另一个列值 - If value contains string, then set another column value 熊猫:按其他列值移动一列 - Pandas: Shift one column by other column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM