简体   繁体   English

从pandas数据帧中删除行,其中句子的句子长度超过某个字长

[英]Remove the rows from pandas dataframe, that has sentences longer than certain word length

I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length. 我想从pandas数据帧中删除行,其中包含长度大于所需长度的特定列的字符串。

For example: 例如:

Input frame: 输入框:

X    Y
0    Hi how are you.
1    An apple
2    glass of water
3    I like to watch movie

Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe. 现在,假设我要从数据帧中删除具有长度大于或等于4的字符串的行。

The desired output frame must be: 所需的输出帧必须是:

X    Y
1    An apple
2    glass of water

Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively. 删除列'X'中值为0,3的行,因为列0中的字数为4,列3分别为5。

First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing : 首先按空格分割值,通过Series.str.len获取行数,并通过反转条件检查>= to < with Series.lt for boolean indexing

df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
   X               Y
1  1        An apple
2  2  glass of water

You can count the spaces: 你可以计算空间:

df[df.Y.str.count('\s+').lt(3)]

   X               Y
1  1        An apple
2  2  glass of water

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从数据框中删除字符串长度大于某个数字的行,在某个字符(“,”)之后直到结束 - Remove the row from dataframe, that has string length greater than a certain number, after a certain character(“,”) till end 如何从熊猫数据框中删除小于一定长度的聚簇/非聚簇值? - How to remove clustered/unclustered values less than a certain length from pandas dataframe? 从句子列表中删除某个单词 - Remove a certain word from a list of sentences Pandas:需要从numpy数组中添加一个新列,但是长度比数据帧的长度长 - Pandas: need to add a new column from a numpy array, but the length is longer than the dataframe's length 给定唯一的列值,Pandas 数据框如何删除以行长小于数字为条件的行? - Pandas dataframe how to remove rows conditioned on the length of rows being smaller than a number, given a unique column value? 如何从列表中删除一定长度的句子/短语? - How to remove sentences/phrases of a certain length from a list? pandas dataframe - 从少于X行的组中删除值 - pandas dataframe - remove values from a group with less than X rows 删除 pandas 中超过指定长度的重复序列 - remove sequences of duplicates longer than a specified length in pandas 如果字符串具有“仅数字”,则从 pandas dataframe 中删除行 - Remove rows from pandas dataframe if string has 'only numbers' 从 Pandas 数据框中删除行 - Remove Rows from Pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM