無法對 Pandas 數據框進行子集化

Question

我在名為in_file的數據框中有以下數據：

Client  Value_01   Value_02   Date
ABC     100       500       2016-09-01T
ABC     14        90        2016-09-02T
DEF     95        1000      2016-09-01T
DEF     200       600       2016-09-02T
GHI     75        19        2016-09-01T
GHI     300       700       2016-09-02T
JKL     50        02        2016-09-01T
JKL     400       800       2016-09-02T

我使用以下內容對數據框進行子集化（我們將其稱為“子集 1”）：

df_01 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client <> 'ABC') & (in_file.Client <> 'DEF')].sort_values('Value_01', ascending=False)

我回來了：

Client  Value_01   Value_02   Date
GHI     75        19        2016-09-01T
JKL     50        02        2016-09-01T

然后，我嘗試使用以下內容（我們將其稱為“子集 2”）對數據框進行子集化：

df_02 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client == 'ABC') & (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)

使用“子集 2”，我得到一個空數據框。 但是，我期待看到以下內容：

Client  Value_01   Value_02   Date
ABC     100       500       2016-09-01T
DEF     95        1000      2016-09-01T

有誰知道為什么“子集 2”代碼沒有返回我期望的數據幀？

提前致謝。

Answer 1

包括isin() ：

In [28]: in_file.loc[(in_file.Date == '2016-09-01T') & in_file.Client.isin(['ABC', 'DEF'])].sort_values('Value_01', ascending=False)
Out[28]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

不包括：

In [29]: in_file.loc[(in_file.Date == '2016-09-01T') & (~in_file.Client.isin(['ABC', 'DEF']))].sort_values('Value_01', ascending=False)
Out[29]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T

或者慢一點，但更好的query()方法：

In [30]: in_file.query("Date == '2016-09-01T' and Client in ['ABC', 'DEF']")
Out[30]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

In [31]: in_file.query("Date == '2016-09-01T' and Client not in ['ABC', 'DEF']")
Out[31]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T

Answer 2

您的第二個子集數據框有兩個相互沖突的條件

(in_file.Client == 'ABC') & (in_file.Client == 'DEF')

永遠不可能兩者同時為真。

您似乎正在尋找的是“或”邏輯而不是“&”邏輯。 所以

df_02 = in_file.loc[(in_file.Date == '2016-09-02T') or (in_file.Client == 'ABC') or (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)

會給你

ABC     100       500       2016-09-01T
ABC     14        90        2016-09-02T
DEF     95        1000      2016-09-01T
DEF     200       600       2016-09-02T
GHI     300       700       2016-09-02T
JKL     400       800       2016-09-02T

Answer 3

警告這不是最好的解決方案！！！
我只想指出你做錯了什么。
@MaxU 有最好的答案

定義cond2

cond2 = (in_file.Date == '2016-09-01T') & \
        (in_file.Client == 'ABC') & \
        (in_file.Client == 'DEF')

這將始終為False因為in_file.Client永遠不能同時是'ABC'和'DEF' 。 您必須使用“或” |

反而

cond2 = (in_file.Date == '2016-09-01T') & \
        ((in_file.Client == 'ABC') | (in_file.Client == 'DEF'))

然后

df_02 = in_file.loc[cond2].sort_values('Value_01', ascending=False)
df_02

但不要選擇這個答案

不如用isin

無法對 Pandas 數據框進行子集化

問題描述

3 個解決方案

解決方案1
2 已采納 2016-09-14 20:22:52

解決方案2
0 2016-09-14 20:27:41

解決方案3
0 2016-09-14 20:28:45

但不要選擇這個答案

無法對 Pandas 數據框進行子集化

問題描述

3 個解決方案

解決方案1 2 已采納 2016-09-14 20:22:52

解決方案2 0 2016-09-14 20:27:41

解決方案3 0 2016-09-14 20:28:45

但不要選擇這個答案

解決方案1
2 已采納 2016-09-14 20:22:52

解決方案2
0 2016-09-14 20:27:41

解決方案3
0 2016-09-14 20:28:45