简体   繁体   English

如何在 pandas dataframe 中过滤小写的行和单词?

[英]How to filter rows and words in lower case in pandas dataframe?

Hi I would like to know how to select rows which contains lower cases in the following dataframe:嗨,我想知道如何在以下 dataframe 中包含小写字母的 select 行:

ID     Name   Note
1      Fin    there IS A dog outside
2      Mik    NOTHING TO DECLARE
3      Lau    no house

What I would like to do is to filter rows where Note column contains at least one word in lower case:我想做的是过滤Note列至少包含一个小写单词的行:

ID     Name   Note
1      Fin    there IS A dog outside
3      Lau    no house

and collect in a list all the words in lower case: my_list=['there','dog','outside','no','house']并在列表中收集所有小写单词: my_list=['there','dog','outside','no','house']

I have tried to filter rows is:我试图过滤行是:

df1=df['Note'].str.lower()

For appending words in the list, I think I should first tokenise the string, then select all the terms in lower case.对于在列表中附加单词,我认为我应该首先标记字符串,然后 select 所有小写术语。 Am I right?我对吗?

Use Series.str.contains for filter at least one lowercase character in boolean indexing :使用Series.str.contains过滤boolean indexing中的至少一个小写字符:

df1 = df[df['Note'].str.contains(r'[a-z]')]
print (df1)
   ID Name                    Note
0   1  Fin  there IS A dog outside
2   3  Lau                no house

And then Series.str.extractall for extract lowercase words:然后Series.str.extractall用于提取小写单词:

my_list = df1['Note'].str.extractall(r'(\b[a-z]+\b)')[0].tolist()
print (my_list)
['there', 'dog', 'outside', 'no', 'house']

Or use list comprehension with split sentences and filter by islower :或者使用拆分句子的列表理解并按islower过滤:

my_list = [y for x in df1['Note'] for y in x.split() if y.islower()]
print (my_list)
['there', 'dog', 'outside', 'no', 'house']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM