[英]How to add a specific word to a new column when it is a value in a list within a column
假設我的數據集
name what
A apple[red]
B cucumber[green]
C dog
C orange
D banana
D monkey
E cat
F carrot
.
.
我想創建並指定一個列表,如果該列包含該列表中包含的值,我想將指定的值設為新列。
列出值
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
結果我想要的
name what class
A apple fruit
B cucumber vegetable
C dog animal
C orange fruit
D banana fruit
D monkey animal
E cat animal
F carrot vegetable
列表值和列值不“匹配”,必須包含。
謝謝你的閱讀。
使用Series.map
和從列表創建的字典,並使用扁平化的值交換鍵:
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
字典理解的循環替代:
d1 = {}
for oldk, oldv in d.items():
for k in oldv:
d1[k] = oldk
接着:
df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
name what class
0 A apple fruit
1 B cucumber vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
編輯:對於子字符串匹配,您可以按字典d
循環,通過Series.str.contains
檢查匹配以獲取掩碼並設置新值:
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
for k, v in d.items():
mask = df['what'].str.contains('|'.join(v))
df.loc[mask, 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
如果可能有多個單詞,請使用單詞邊界:
for k, v in d.items():
pat = '|'.join(r"\b{}\b".format(x) for x in v)
df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.