Select 行，其中 A 列的值以 B 列的值開始

Question

我有一個 pandas dataframe 並且想要 select 行，其中列的值以另一列的值開頭。 我嘗試了以下方法：

import pandas as pd

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

df_subset = df[df['A'].str.startswith(df['B'])]

但它出錯了，我發現的這個解決方案也沒有幫助。

KeyError: "None of [Float64Index([nan, nan, nan], dtype='float64')] are in the [columns]"

np.where(df['A'].str.startswith(df['B']), True, False)從這里也為所有人返回True 。

Answer 1

對於逐行比較，我們可以使用DataFrame.apply ：

m = df.apply(lambda x: x['A'].startswith(x['B']), axis=1)
df[m]

       A    B
0  apple  app
2     aa   aa

您的代碼不起作用的原因是Series.str.startswith接受character sequence （字符串標量），並且您使用的是 pandas Series 。 引用文檔：

拍：str
字符序列。 不接受正則表達式。

Answer 2

您可能需要使用 for 循環，因為str.startswith不支持行檢查

[x.startswith(y) for x , y in zip(df.A,df.B)]
Out[380]: [True, False, True]
df_sub=df[[x.startswith(y) for x , y in zip(df.A,df.B)]].copy()

Answer 3

您可以在不使用 for 循環的情況下實現此目的：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

ufunc = np.frompyfunc(str.startswith, 2, 1)
idx = ufunc(df['A'], df['B'])
df[idx]

Out[22]: 
       A    B
0  apple  app
2     aa   aa

Select 行，其中 A 列的值以 B 列的值開始

問題描述

3 個解決方案

解決方案1
4 已采納 2020-06-23 13:50:34

解決方案2
3 2020-06-23 13:45:32

解決方案3
1 2020-06-23 14:00:50

Select 行，其中 A 列的值以 B 列的值開始

問題描述

3 個解決方案

解決方案1 4 已采納 2020-06-23 13:50:34

解決方案2 3 2020-06-23 13:45:32

解決方案3 1 2020-06-23 14:00:50

解決方案1
4 已采納 2020-06-23 13:50:34

解決方案2
3 2020-06-23 13:45:32

解決方案3
1 2020-06-23 14:00:50