繁体   English   中英

仅当列表中只有一个匹配词时才提取

[英]Extract only if there is only one matching word from the list

fruit_type = ['Apple','Banana','Cherries','Dragonfruit']


for row in df['sentence']:
    sentence['fruit_type'] = df['sentence'].str.extract("(" + "|".join(fruit_type) +")", expand=False)

上面代码的结果是:

df
sentence                    | fruit_type
here is an apple            | apple
here is a banana, an apple  | banana
here is an orange, a banana | orange

我如何修改代码,以便如果df['sentence']有 1 个以上的水果类型, df['fruit_type']将返回 NaN?

取而代之的extract可以使用exctractall联合groupbyapply

首先,获取所有匹配项:

df['sentence'].str.extractall("(" + "|".join(fruit_type) +")")
        0
match   
0   0   apple
1   0   banana
    1   apple
2   0   banana

请注意,有pandas.MultiIndex

然后,使用.groupby(level=0)[0].apply(list)你会得到:

0            [apple]
1    [banana, apple]
2           [banana]

最后,在使用.apply(lambda x: x[0] if len(x) == 1 else np.nan)

0     apple
1       NaN
2    banana

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM