[英]How to add a specific word to a new column when it is a value in a list within a column
supposed my data set假设我的数据集
name what
A apple[red]
B cucumber[green]
C dog
C orange
D banana
D monkey
E cat
F carrot
.
.
I want to create and specify a list, and if the column contains a value contained in that list, I want to make the specified value a new column.我想创建并指定一个列表,如果该列包含该列表中包含的值,我想将指定的值设为新列。
list value列出值
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
result what I want结果我想要的
name what class
A apple fruit
B cucumber vegetable
C dog animal
C orange fruit
D banana fruit
D monkey animal
E cat animal
F carrot vegetable
List values and column values do not 'match' and must be contained.列表值和列值不“匹配”,必须包含。
thank you for reading.谢谢你的阅读。
Use Series.map
with dictionary created from lists and swapped keys with values with flattening:使用
Series.map
和从列表创建的字典,并使用扁平化的值交换键:
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
Loop alternative of dictionary comprehension:字典理解的循环替代:
d1 = {}
for oldk, oldv in d.items():
for k in oldv:
d1[k] = oldk
And then:接着:
df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
name what class
0 A apple fruit
1 B cucumber vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
EDIT: For match by subtrings you can loop by dictionary d
, check matching by Series.str.contains
for mask and set new values:编辑:对于子字符串匹配,您可以按字典
d
循环,通过Series.str.contains
检查匹配以获取掩码并设置新值:
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
for k, v in d.items():
mask = df['what'].str.contains('|'.join(v))
df.loc[mask, 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
If possible multiple words is possible use words boundaries:如果可能有多个单词,请使用单词边界:
for k, v in d.items():
pat = '|'.join(r"\b{}\b".format(x) for x in v)
df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.