简体   繁体   English

从熊猫数据框中删除所有少于 3 个字符的行

[英]Removing from pandas dataframe all rows having less than 3 characters

I have this dataframe我有这个数据框

Word    Frequency
0   :       79
1   ,       60
2   look    26
3   e       26
4   a       25
... ... ...
95  trump    2
96  election 2
97  step     2
98  day      2
99  university  2

I would like to remove all words having less than 3 characters.我想删除所有少于 3 个字符的单词。 I tried as follows:我试过如下:

df['Word']=df['Word'].str.findall('\w{3,}').str.join(' ')

but it does not remove them from my datataset.但它不会从我的数据集中删除它们。 Can you please tell me how to remove them?你能告诉我如何删除它们吗? My expected output would be:我的预期输出是:

Word    Frequency

2   look    26

... ... ...
95  trump    2
96  election 2
97  step     2
98  day      2
99  university  2

试试

df = df[df['Word'].str.len()>=3]

Instead of attempting a regular expression, you can use .str.len() to get the length of each string of your column.您可以使用.str.len()来获取列中每个字符串的长度,而不是尝试使用正则表达式。 Then you can simply filter based on that length for >= 3然后您可以简单地根据该长度进行过滤>= 3

Should look like:应该看起来像:

df.loc[df["Word"].str.len() >= 3]

请尝试

 df[df.Word.str.len()>=3]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM