[英]Select rows from a DataFrame based on string values in a column in pandas
How to select rows from a DataFrame based on string values in a column in pandas? 如何基于熊猫列中的字符串值从DataFrame中选择行? I just want to display the just States only which are in all CAPS.
我只想显示所有CAPS中仅有的状态。 The states have the total number of cities.
州拥有城市总数。
import pandas as pd
import matplotlib.pyplot as plt
%pylab inline
d = pd.read_csv("states.csv")
print(d)
print(df)
# States/cities B C D
# 0 FL 3 5 6
# 1 Orlando 1 2 3
# 2 Miami 1 1 3
# 3 Jacksonville 1 2 0
# 4 CA 8 3 2
# 5 San diego 3 1 0
# 6 San Francisco 5 2 2
# 7 WA 4 2 1
# 8 Seattle 3 1 0
# 9 Tacoma 1 1 1
How to display like so, 像这样显示
# States/Cites B C D
# 0 FL 3 5 6
# 4 CA 8 3 2
# 7 WA 4 2 1
Consider pandas.Series.str.match passing a regex for only [AZ] 考虑只为[AZ]传递正则表达式的pandas.Series.str.match
states[states['States/cities'].str.match('^.*[A-Z]$')]
# States/cities B C D
# 0 FL 3 5 6
# 4 CA 8 3 2
# 7 WA 4 2 1
Data 数据
from io import StringIO
import pandas as pd
txt = '''"States/cities" B C D
0 FL 3 5 6
1 Orlando 1 2 3
2 Miami 1 1 3
3 Jacksonville 1 2 0
4 CA 8 3 2
5 "San diego" 3 1 0
6 "San Francisco" 5 2 2
7 WA 4 2 1
8 Seattle 3 1 0
9 Tacoma 1 1 1'''
states = pd.read_table(StringIO(txt), sep="\s+")
You can get the rows with all uppercase values in the column States/cities
like this: 您可以在
States/cities
列中获取所有大写值的行,如下所示:
df.loc[df['States/cities'].str.isupper()]
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Just to be safe, you can add a condition so that it only returns the rows where 'States/cities'
is uppercase and only 2 characters long (in case you had a value that was SEATTLE
or something like that): 为安全起见,您可以添加一个条件,使其仅返回
'States/cities'
为大写且仅2个字符长的行(以防您使用的值是SEATTLE
或类似的值):
df.loc[(df['States/cities'].str.isupper()) & (df['States/cities'].apply(len) == 2)]
You can write a function to be applied to each value in the States/cities
column. 您可以在“
States/cities
列中编写要应用于每个值的函数。 Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame. 让函数返回True或False,并且应用该函数的结果可以充当DataFrame上的布尔值过滤器。
This is a common pattern when working with pandas. 这是与熊猫一起工作时的常见模式。 In your particular case, you could check for each value in
States/cities
whether it's made of only uppercase letters. 在您的特定情况下,您可以检查
States/cities
每个值是否仅由大写字母组成。
So for example: 因此,例如:
def is_state_abbrev(string):
return string.isupper()
filter = d['States/cities'].apply(is_state_abbrev)
filtered_df = d[filter]
Here filter
will be a pandas Series with True
and False
values. 这里的
filter
将是具有True
和False
值的熊猫系列。
You can also achieve the same result by using a lambda expression, as in: 您还可以通过使用lambda表达式来达到相同的结果,如下所示:
filtered_df = d[d['States/cities'].apply(lambda x: x.isupper())]
This does essentially the same thing. 这实际上是相同的。
If we assuming the order is always State followed by the city from the state , we can using where
and dropna
如果我们假设顺序始终是州,然后是州的城市,则可以使用
where
和dropna
df['States/cities']=df['States/cities'].where(df['States/cities'].isin(['FL','CA','WA']))
df.dropna()
df
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Or we do str.len
或者我们做
str.len
df[df['States/cities'].str.len()==2]
Out[39]:
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
You can use str.contains to filter any row that contains small alphabets 您可以使用str.contains过滤包含小字母的任何行
df[~df['States/cities'].str.contains('[a-z]')]
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.