当特定单词是列内列表中的值时，如何将特定单词添加到新列

Question

supposed my data set假设我的数据集

name what
A    apple[red]
B    cucumber[green]
C    dog
C    orange
D    banana
D    monkey
E    cat
F    carrot
.
.

I want to create and specify a list, and if the column contains a value contained in that list, I want to make the specified value a new column.我想创建并指定一个列表，如果该列包含该列表中包含的值，我想将指定的值设为新列。

list value列出值

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']

result what I want结果我想要的

name what     class
A    apple    fruit
B    cucumber vegetable
C    dog      animal
C    orange   fruit
D    banana   fruit
D    monkey   animal
E    cat      animal
F    carrot   vegetable

List values and column values do not 'match' and must be contained.列表值和列值不“匹配”，必须包含。

thank you for reading.谢谢你的阅读。

Answer 1

Use Series.map with dictionary created from lists and swapped keys with values with flattening:使用Series.map和从列表创建的字典，并使用扁平化的值交换键：

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

Loop alternative of dictionary comprehension:字典理解的循环替代：

d1 = {}
for oldk, oldv in d.items():
    for k in oldv:
        d1[k] = oldk

And then:接着：

df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
  name      what      class
0    A     apple      fruit
1    B  cucumber  vegetable
2    C       dog     animal
3    C    orange      fruit
4    D    banana      fruit
5    D    monkey     animal
6    E       cat     animal
7    F    carrot  vegetable

EDIT: For match by subtrings you can loop by dictionary d , check matching by Series.str.contains for mask and set new values:编辑：对于子字符串匹配，您可以按字典d循环，通过Series.str.contains检查匹配以获取掩码并设置新值：

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}

for k, v in d.items():
    mask = df['what'].str.contains('|'.join(v))
    df.loc[mask, 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable

If possible multiple words is possible use words boundaries:如果可能有多个单词，请使用单词边界：

for k, v in d.items():
    pat = '|'.join(r"\b{}\b".format(x) for x in v)
    df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable

当特定单词是列内列表中的值时，如何将特定单词添加到新列

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-06-01 07:29:53

当特定单词是列内列表中的值时，如何将特定单词添加到新列

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-06-01 07:29:53

解决方案1
3 已采纳 2020-06-01 07:29:53