繁体   English   中英

从 Pandas 系列中删除文字在文本中出现少于 2 次

[英]Remove Words appears less than 2 times in text' from Pandas Series

我正在尝试删除 Pandas 系列中每个标量值中出现的所有单词。 最好的方法是什么? 这是我失败的尝试:

 from collections import Counter df = pd.DataFrame({'text':["The quick brown fox", "jumped over the lazy dog","jumped over the lazy dog"]}) d=''.join(df['text'][:]) m=d.split() q=Counter(m) print (q) df['text'].str.split().map(lambda el: " ".join(Counter(el for el in q.elements() if q[el] >= 2)))

 out put: Counter({'over': 2, 'the': 2, 'lazy': 2, 'The': 1, 'quick': 1, 'brown': 1, 'foxjumped': 1, 'dogjumped': 1, 'dog': 1}) 0 over the lazy 1 over the lazy 2 over the lazy Name: text, dtype: object

from collections import Counter

df = pd.DataFrame({'text':["The quick brown fox", "jumped over the lazy dog","jumped over the lazy dog"]})
c = Counter(df.text.str.split().explode())
print( df.text.apply(lambda x: ' '.join(w for w in x.split() if c[w] >= 2).strip()) )

印刷:

0                            
1    jumped over the lazy dog
2    jumped over the lazy dog
Name: text, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM