[英]Python Pandas: Is there a way to obtain a subset dataframe based on strings in a list
I am looking to make a subset df based on the string values in a list.我正在寻找基于列表中的字符串值的子集 df 。
A toy model example:以玩具 model 为例:
data = {'month': ['January','February','March','April','May','June','July','August','September','October','November','December'],
'days_in_month': [31,28,31,30,31,30,31,31,30,31,30,31]
}
df = pd.DataFrame(data, columns = ['month', 'days_in_month'])
summer_months = ['Dec', 'Jan', 'Feb']
contain_values = df[df['month'].str.contains(summer_months)]
print (df)
This would fail because of contain_values = df[df['month'].str.contains(summer_months)]
这会因为contain_values = df[df['month'].str.contains(summer_months)]
TypeError: unhashable type: 'list'
I know that contain_values = df[df['month'].str.contains('Dec')]
works but I would like to return the new dataframe
with the summer months in it.我知道contain_values = df[df['month'].str.contains('Dec')]
有效,但我想返回带有夏季月份的新dataframe
。 Or even all the none summer months using the ~
function.甚至使用~
的所有非夏季月份。
Thanks谢谢
>>> contain_values = df[df['month'].str.contains('|'.join(summer_months))]
>>> contain_values
month days_in_month
0 January 31
1 February 28
11 December 31
You can as well using what .str
offers you:您也可以使用.str
为您提供的内容:
df[df["month"].str[:3].isin(summer_months)]
OUTPUT OUTPUT
month days_in_month
0 January 31
1 February 28
11 December 31
You can make it more robust using something like this (in case names in the dataframe are not properly capitalized):您可以使用类似这样的方法使其更健壮(如果 dataframe 中的名称未正确大写):
df[df["month"].str.capitalize().str[:3]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.