Recently, I switched from matlab to python with pandas. It has been working great, but i am stuck at solving the following problem efficiently. For my analysis, I have to dataframes that look somewhat like this:
dfA =
NUM In Date
0 2345 we 1 01/03/16
1 3631 we 1 23/02/16
2 2564 we 1 12/02/16
3 8785 sz 2 01/03/16
4 4767 dt 6 01/03/16
5 3452 dt 7 23/02/16
6 2134 sz 2 01/03/16
7 3465 sz 2 01/03/16
and
dfB
In Count_Num
0 we 1 3
1 sz 2 2
2 dt 6 3
3 dt 7 1
What I would like to perform is a an operation that sums all 'Num' for all "In" in dfA and compares it with the "Count_num" in dfB. Afterwards, I would like to add an column to dfB to return if the comparison is True or False. In the example above, the operation should return this:
dfB
In Count_Num Check
0 we 1 3 True
1 sz 2 2 False
2 dt 6 1 True
3 dt 7 1 True
My approach:
With value_counts() and pd.DataFrame, I constructed the following dfC from dfA dfC =
In_Number In_Total
0 we 1 4
1 sz 2 3
2 dt 6 1
3 dt 7 1
Then I merged it with dfB to check it afterwards if the values are the same by comparing the columns within dfB. In this case, I have to end dropping the columns. Is there a better/faster way to do this? I think there is a way to do this very efficiently with one of pandas great functions. I've tried to look into lookup
and map
, but I can not make it work.
Thanks for the help!
You can try merge
dfB
and dfA
with groupby
and count
by column In
, then add new column check
for comparison merged columns and last drop
column NUM
:
print dfA
NUM In Date
0 2345 we 1 01/03/16
1 3631 we 1 23/02/16
2 2564 we 1 12/02/16
3 8785 sz 2 01/03/16
4 4767 dt 6 01/03/16
5 3452 dt 7 23/02/16
6 2134 sz 2 01/03/16
7 3465 sz 2 01/03/16
print dfB
In Count_Num
0 we 1 3
1 sz 2 2
2 dt 6 3
3 dt 7 1
print dfA.groupby('In', as_index=False)['NUM'].count()
In NUM
0 dt 6 1
1 dt 7 1
2 sz 2 3
3 we 1 3
df = pd.merge(dfB, dfA.groupby('In', as_index=False)['NUM'].count(), on=['In'])
print df
In Count_Num NUM
0 we 1 3 3
1 sz 2 2 3
2 dt 6 3 1
3 dt 7 1 1
df['check'] = df['NUM'] == df['Count_Num']
df = df.drop('NUM', axis=1)
print df
In Count_Num check
0 we 1 3 True
1 sz 2 2 False
2 dt 6 3 False
3 dt 7 1 True
Or you can use rename
without drop
:
df = pd.merge(dfB, dfA.groupby('In', as_index=False)['NUM'].count(), on=['In'])
print df
In Count_Num NUM
0 we 1 3 3
1 sz 2 2 3
2 dt 6 3 1
3 dt 7 1 1
df['NUM'] = df['NUM'] == df['Count_Num']
df = df.rename(columns={'NUM':'Check'})
print df
In Count_Num Check
0 we 1 3 True
1 sz 2 2 False
2 dt 6 3 False
3 dt 7 1 True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.