[英]Extract only if there is only one matching word from the list
fruit_type = ['Apple','Banana','Cherries','Dragonfruit']
for row in df['sentence']:
sentence['fruit_type'] = df['sentence'].str.extract("(" + "|".join(fruit_type) +")", expand=False)
上面代码的结果是:
df
sentence | fruit_type
here is an apple | apple
here is a banana, an apple | banana
here is an orange, a banana | orange
我如何修改代码,以便如果df['sentence']
有 1 个以上的水果类型, df['fruit_type']
将返回 NaN?
取而代之的extract
可以使用exctractall
联合groupby
和apply
:
首先,获取所有匹配项:
df['sentence'].str.extractall("(" + "|".join(fruit_type) +")")
0
match
0 0 apple
1 0 banana
1 apple
2 0 banana
请注意,有pandas.MultiIndex
。
然后,使用.groupby(level=0)[0].apply(list)
你会得到:
0 [apple]
1 [banana, apple]
2 [banana]
最后,在使用.apply(lambda x: x[0] if len(x) == 1 else np.nan)
:
0 apple
1 NaN
2 banana
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.