简体   繁体   中英

Unable to subset Pandas dataframe

I have the following data in a data frame named in_file :

Client  Value_01   Value_02   Date
ABC     100       500       2016-09-01T
ABC     14        90        2016-09-02T
DEF     95        1000      2016-09-01T
DEF     200       600       2016-09-02T
GHI     75        19        2016-09-01T
GHI     300       700       2016-09-02T
JKL     50        02        2016-09-01T
JKL     400       800       2016-09-02T

I subset the data frame with the following (which we'll call 'subset 1'):

df_01 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client <> 'ABC') & (in_file.Client <> 'DEF')].sort_values('Value_01', ascending=False)

and I get back:

Client  Value_01   Value_02   Date
GHI     75        19        2016-09-01T
JKL     50        02        2016-09-01T

Then, I attempt to subset the data frame with the following (which we'll call 'subset 2'):

df_02 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client == 'ABC') & (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)

With 'subset 2', I get back an empty data frame . But, I was expecting to see the following:

Client  Value_01   Value_02   Date
ABC     100       500       2016-09-01T
DEF     95        1000      2016-09-01T

Does anyone know why the 'subset 2' code is not returning the data frame that I expect?

Thanks in advance.

including isin() :

In [28]: in_file.loc[(in_file.Date == '2016-09-01T') & in_file.Client.isin(['ABC', 'DEF'])].sort_values('Value_01', ascending=False)
Out[28]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

excluding:

In [29]: in_file.loc[(in_file.Date == '2016-09-01T') & (~in_file.Client.isin(['ABC', 'DEF']))].sort_values('Value_01', ascending=False)
Out[29]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T

Or bit slower, but much nicer query() method:

In [30]: in_file.query("Date == '2016-09-01T' and Client in ['ABC', 'DEF']")
Out[30]:
  Client  Value_01  Value_02         Date
0    ABC       100       500  2016-09-01T
2    DEF        95      1000  2016-09-01T

In [31]: in_file.query("Date == '2016-09-01T' and Client not in ['ABC', 'DEF']")
Out[31]:
  Client  Value_01  Value_02         Date
4    GHI        75        19  2016-09-01T
6    JKL        50         2  2016-09-01T

You have two conflicting conditions for your second subset dataframe

(in_file.Client == 'ABC') & (in_file.Client == 'DEF')

Can never both be true at the same time.

What you seem to be looking for is 'or' logic not '&' logic. So

df_02 = in_file.loc[(in_file.Date == '2016-09-02T') or (in_file.Client == 'ABC') or (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)

will give you

ABC     100       500       2016-09-01T
ABC     14        90        2016-09-02T
DEF     95        1000      2016-09-01T
DEF     200       600       2016-09-02T
GHI     300       700       2016-09-02T
JKL     400       800       2016-09-02T

Caveat This is not the best solution!!!
I only want to point out what you were doing wrong.
@MaxU has the best answer

define cond2

cond2 = (in_file.Date == '2016-09-01T') & \
        (in_file.Client == 'ABC') & \
        (in_file.Client == 'DEF')

This will always be False as in_file.Client can never be both 'ABC' and 'DEF' . You must use 'or' |

Instead

cond2 = (in_file.Date == '2016-09-01T') & \
        ((in_file.Client == 'ABC') | (in_file.Client == 'DEF'))

Then

df_02 = in_file.loc[cond2].sort_values('Value_01', ascending=False)
df_02

在此处输入图片说明


But Don't Choose This Answer

It is not as good as using isin

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM