根據熊貓列中的字符串值從DataFrame中選擇行

Question

如何基於熊貓列中的字符串值從DataFrame中選擇行？ 我只想顯示所有CAPS中僅有的狀態。 州擁有城市總數。

import pandas as pd
import matplotlib.pyplot as plt
%pylab inline
d = pd.read_csv("states.csv")
print(d)
print(df)
# States/cities           B  C   D
# 0  FL                   3  5   6
# 1  Orlando              1  2   3
# 2  Miami                1  1   3
# 3  Jacksonville         1  2   0
# 4  CA                   8  3   2
# 5  San diego            3  1   0
# 6  San Francisco        5  2   2
# 7  WA                   4  2   1
# 8  Seattle              3  1   0 
# 9  Tacoma               1  1   1

像這樣顯示

# States/Cites        B   C   D
# 0  FL               3  5   6               
# 4  CA               8  3   2
# 7  WA               4  2   1

Answer 1

考慮只為[AZ]傳遞正則表達式的pandas.Series.str.match

states[states['States/cities'].str.match('^.*[A-Z]$')]

#   States/cities  B  C  D
# 0            FL  3  5  6
# 4            CA  8  3  2
# 7            WA  4  2  1

數據

from io import StringIO
import pandas as pd

txt = '''"States/cities"           B  C   D
0  FL                   3  5   6
1  Orlando              1  2   3
2  Miami                1  1   3
3  Jacksonville         1  2   0
4  CA                   8  3   2
5  "San diego"            3  1   0
6  "San Francisco"        5  2   2
7  WA                   4  2   1
8  Seattle              3  1   0 
9  Tacoma               1  1   1'''

states = pd.read_table(StringIO(txt), sep="\s+")

Answer 2

您可以在States/cities列中獲取所有大寫值的行，如下所示：

df.loc[df['States/cities'].str.isupper()]

  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

為安全起見，您可以添加一個條件，使其僅返回'States/cities'為大寫且僅2個字符長的行（以防您使用的值是SEATTLE或類似的值）：

df.loc[(df['States/cities'].str.isupper()) & (df['States/cities'].apply(len) == 2)]

Answer 3

您可以在“ States/cities列中編寫要應用於每個值的函數。 讓函數返回True或False，並且應用該函數的結果可以充當DataFrame上的布爾值過濾器。

這是與熊貓一起工作時的常見模式。 在您的特定情況下，您可以檢查States/cities每個值是否僅由大寫字母組成。

因此，例如：

def is_state_abbrev(string):
    return string.isupper()

filter = d['States/cities'].apply(is_state_abbrev)
filtered_df = d[filter]

這里的filter將是具有True和False值的熊貓系列。

您還可以通過使用lambda表達式來達到相同的結果，如下所示：

filtered_df = d[d['States/cities'].apply(lambda x: x.isupper())]

這實際上是相同的。

Answer 4

如果我們假設順序始終是州，然后是州的城市，則可以使用where和dropna

df['States/cities']=df['States/cities'].where(df['States/cities'].isin(['FL','CA','WA']))


df.dropna()
df
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

或者我們做str.len

df[df['States/cities'].str.len()==2]
Out[39]: 
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Answer 5

您可以使用str.contains過濾包含小字母的任何行

df[~df['States/cities'].str.contains('[a-z]')]

    States/cities   B   C   D
0   FL              3   5   6
4   CA              8   3   2
7   WA              4   2   1

根據熊貓列中的字符串值從DataFrame中選擇行

問題描述

5 個解決方案

解決方案1
1 2018-04-04 01:49:54

解決方案2
1 2018-04-04 02:03:57

解決方案3
0 2018-04-04 01:34:53

解決方案4
0 2018-04-04 02:02:06

解決方案5
0 2018-04-04 02:12:05

根據熊貓列中的字符串值從DataFrame中選擇行

問題描述

5 個解決方案

解決方案1 1 2018-04-04 01:49:54

解決方案2 1 2018-04-04 02:03:57

解決方案3 0 2018-04-04 01:34:53

解決方案4 0 2018-04-04 02:02:06

解決方案5 0 2018-04-04 02:12:05

解決方案1
1 2018-04-04 01:49:54

解決方案2
1 2018-04-04 02:03:57

解決方案3
0 2018-04-04 01:34:53

解決方案4
0 2018-04-04 02:02:06

解決方案5
0 2018-04-04 02:12:05