简体   繁体   English

根据熊猫列中的字符串值从DataFrame中选择行

[英]Select rows from a DataFrame based on string values in a column in pandas

How to select rows from a DataFrame based on string values in a column in pandas? 如何基于熊猫列中的字符串值从DataFrame中选择行? I just want to display the just States only which are in all CAPS. 我只想显示所有CAPS中仅有的状态。 The states have the total number of cities. 州拥有城市总数。

import pandas as pd
import matplotlib.pyplot as plt
%pylab inline
d = pd.read_csv("states.csv")
print(d)
print(df)
# States/cities           B  C   D
# 0  FL                   3  5   6
# 1  Orlando              1  2   3
# 2  Miami                1  1   3
# 3  Jacksonville         1  2   0
# 4  CA                   8  3   2
# 5  San diego            3  1   0
# 6  San Francisco        5  2   2
# 7  WA                   4  2   1
# 8  Seattle              3  1   0 
# 9  Tacoma               1  1   1

How to display like so, 像这样显示

# States/Cites        B   C   D
# 0  FL               3  5   6               
# 4  CA               8  3   2
# 7  WA               4  2   1

Consider pandas.Series.str.match passing a regex for only [AZ] 考虑只为[AZ]传递正则表达式的pandas.Series.str.match

states[states['States/cities'].str.match('^.*[A-Z]$')]

#   States/cities  B  C  D
# 0            FL  3  5  6
# 4            CA  8  3  2
# 7            WA  4  2  1

Data 数据

from io import StringIO
import pandas as pd

txt = '''"States/cities"           B  C   D
0  FL                   3  5   6
1  Orlando              1  2   3
2  Miami                1  1   3
3  Jacksonville         1  2   0
4  CA                   8  3   2
5  "San diego"            3  1   0
6  "San Francisco"        5  2   2
7  WA                   4  2   1
8  Seattle              3  1   0 
9  Tacoma               1  1   1'''

states = pd.read_table(StringIO(txt), sep="\s+")

You can get the rows with all uppercase values in the column States/cities like this: 您可以在States/cities列中获取所有大写值的行,如下所示:

df.loc[df['States/cities'].str.isupper()]

  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Just to be safe, you can add a condition so that it only returns the rows where 'States/cities' is uppercase and only 2 characters long (in case you had a value that was SEATTLE or something like that): 为安全起见,您可以添加一个条件,使其仅返回'States/cities'为大写仅2个字符长的行(以防您使用的值是SEATTLE或类似的值):

df.loc[(df['States/cities'].str.isupper()) & (df['States/cities'].apply(len) == 2)]

You can write a function to be applied to each value in the States/cities column. 您可以在“ States/cities列中编写要应用于每个值的函数。 Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame. 让函数返回True或False,并且应用该函数的结果可以充当DataFrame上的布尔值过滤器。

This is a common pattern when working with pandas. 这是与熊猫一起工作时的常见模式。 In your particular case, you could check for each value in States/cities whether it's made of only uppercase letters. 在您的特定情况下,您可以检查States/cities每个值是否仅由大写字母组成。

So for example: 因此,例如:

def is_state_abbrev(string):
    return string.isupper()

filter = d['States/cities'].apply(is_state_abbrev)
filtered_df = d[filter]

Here filter will be a pandas Series with True and False values. 这里的filter将是具有TrueFalse值的熊猫系列。

You can also achieve the same result by using a lambda expression, as in: 您还可以通过使用lambda表达式来达到相同的结果,如下所示:

filtered_df = d[d['States/cities'].apply(lambda x: x.isupper())]

This does essentially the same thing. 这实际上是相同的。

If we assuming the order is always State followed by the city from the state , we can using where and dropna 如果我们假设顺序始终是州,然后是州的城市,则可以使用wheredropna

df['States/cities']=df['States/cities'].where(df['States/cities'].isin(['FL','CA','WA']))


df.dropna()
df
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Or we do str.len 或者我们做str.len

df[df['States/cities'].str.len()==2]
Out[39]: 
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

You can use str.contains to filter any row that contains small alphabets 您可以使用str.contains过滤包含小字母的任何行

df[~df['States/cities'].str.contains('[a-z]')]

    States/cities   B   C   D
0   FL              3   5   6
4   CA              8   3   2
7   WA              4   2   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM