[英]Remove the rows from pandas dataframe, that has sentences longer than certain word length
I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length. 我想从pandas数据帧中删除行,其中包含长度大于所需长度的特定列的字符串。
For example: 例如:
Input frame: 输入框:
X Y
0 Hi how are you.
1 An apple
2 glass of water
3 I like to watch movie
Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe. 现在,假设我要从数据帧中删除具有长度大于或等于4的字符串的行。
The desired output frame must be: 所需的输出帧必须是:
X Y
1 An apple
2 glass of water
Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively. 删除列'X'中值为0,3的行,因为列0中的字数为4,列3分别为5。
First split values by whitespace, get number of rows by Series.str.len
and check by inverted condition >=
to <
with Series.lt
for boolean indexing
: 首先按空格分割值,通过Series.str.len
获取行数,并通过反转条件检查>=
to <
with Series.lt
for boolean indexing
:
df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
X Y
1 1 An apple
2 2 glass of water
You can count the spaces: 你可以计算空间:
df[df.Y.str.count('\s+').lt(3)]
X Y
1 1 An apple
2 2 glass of water
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.