I have the following data in a data frame named in_file
:
Client Value_01 Value_02 Date
ABC 100 500 2016-09-01T
ABC 14 90 2016-09-02T
DEF 95 1000 2016-09-01T
DEF 200 600 2016-09-02T
GHI 75 19 2016-09-01T
GHI 300 700 2016-09-02T
JKL 50 02 2016-09-01T
JKL 400 800 2016-09-02T
I subset the data frame with the following (which we'll call 'subset 1'):
df_01 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client <> 'ABC') & (in_file.Client <> 'DEF')].sort_values('Value_01', ascending=False)
and I get back:
Client Value_01 Value_02 Date
GHI 75 19 2016-09-01T
JKL 50 02 2016-09-01T
Then, I attempt to subset the data frame with the following (which we'll call 'subset 2'):
df_02 = in_file.loc[(in_file.Date == '2016-09-01T') & (in_file.Client == 'ABC') & (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)
With 'subset 2', I get back an empty data frame . But, I was expecting to see the following:
Client Value_01 Value_02 Date
ABC 100 500 2016-09-01T
DEF 95 1000 2016-09-01T
Does anyone know why the 'subset 2' code is not returning the data frame that I expect?
Thanks in advance.
including isin()
:
In [28]: in_file.loc[(in_file.Date == '2016-09-01T') & in_file.Client.isin(['ABC', 'DEF'])].sort_values('Value_01', ascending=False)
Out[28]:
Client Value_01 Value_02 Date
0 ABC 100 500 2016-09-01T
2 DEF 95 1000 2016-09-01T
excluding:
In [29]: in_file.loc[(in_file.Date == '2016-09-01T') & (~in_file.Client.isin(['ABC', 'DEF']))].sort_values('Value_01', ascending=False)
Out[29]:
Client Value_01 Value_02 Date
4 GHI 75 19 2016-09-01T
6 JKL 50 2 2016-09-01T
Or bit slower, but much nicer query()
method:
In [30]: in_file.query("Date == '2016-09-01T' and Client in ['ABC', 'DEF']")
Out[30]:
Client Value_01 Value_02 Date
0 ABC 100 500 2016-09-01T
2 DEF 95 1000 2016-09-01T
In [31]: in_file.query("Date == '2016-09-01T' and Client not in ['ABC', 'DEF']")
Out[31]:
Client Value_01 Value_02 Date
4 GHI 75 19 2016-09-01T
6 JKL 50 2 2016-09-01T
You have two conflicting conditions for your second subset dataframe
(in_file.Client == 'ABC')
& (in_file.Client == 'DEF')
Can never both be true at the same time.
What you seem to be looking for is 'or' logic not '&' logic. So
df_02 = in_file.loc[(in_file.Date == '2016-09-02T') or (in_file.Client == 'ABC') or (in_file.Client == 'DEF')].sort_values('Value_01', ascending=False)
will give you
ABC 100 500 2016-09-01T
ABC 14 90 2016-09-02T
DEF 95 1000 2016-09-01T
DEF 200 600 2016-09-02T
GHI 300 700 2016-09-02T
JKL 400 800 2016-09-02T
Caveat This is not the best solution!!!
I only want to point out what you were doing wrong.
@MaxU has the best answer
define cond2
cond2 = (in_file.Date == '2016-09-01T') & \
(in_file.Client == 'ABC') & \
(in_file.Client == 'DEF')
This will always be False
as in_file.Client
can never be both 'ABC'
and 'DEF'
. You must use 'or' |
Instead
cond2 = (in_file.Date == '2016-09-01T') & \
((in_file.Client == 'ABC') | (in_file.Client == 'DEF'))
Then
df_02 = in_file.loc[cond2].sort_values('Value_01', ascending=False)
df_02
It is not as good as using isin
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.