I would like to group df
below by Date
and ItemId
:
Id Timestamp Data ItemId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-21
...
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-21
...
to get all combinations of Date
and ItemId
, then select from original dataframe df
using each combination of Date
and ItemId
, for instance, Date== 2012-04-21 and ItemId==1
, Date== 2012-04-21 and ItemId==2
...
How do I select data using the 2 columns simultaneously in the for-loop?
Since when using group by
each row index will be a tuple (2012-04-21,1)
, (2012-04-21,2)
, (2012-04-22,1)
:
from datetime import datetime
import pandas as pd
import io
s_e=""" Id Timestamp Data ProductId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-22
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-22
"""
pd.set_option('display.max_columns', None )
df = pd.read_csv(io.StringIO(s_e), sep=' ', parse_dates=[1,4], engine='python')
df=df.groupby(['Date','ProductId']).agg(list)
print('df:\n',df)
print('df.index.values:\n',df.index.values)
Ouput:
>>>df:
Timestamp Data
Date ProductId
2012-04-21 1 [2012-04-21 00:04:03.533000, 2012-04-21 00:04:04.870000] [39.0, 38.5]
2 [2012-04-21 00:03:04.817000] [37.5]
2012-04-22 1 [2012-04-21 00:07:03.450000] [38.0]
2 [2012-04-21 00:10:04.400000] [37.0]
>>>df.index.values:
[(Timestamp('2012-04-21 00:00:00'), 1)
(Timestamp('2012-04-21 00:00:00'), 2)
(Timestamp('2012-04-22 00:00:00'), 1)
(Timestamp('2012-04-22 00:00:00'), 2)]
You could try something like this to select specific combination, for example Date== 2012-04-21 and ItemId==1
combination:
datetoselect=(datetime.strptime('2012-04-21','%Y-%m-%d'),2) #Date== 2012-04-21 and ItemId==1
print(df[[i==datetoselect for i in df.index.values]])
Output:
Id Timestamp Data
Date ProductId
2012-04-21 2 [2012-04-22 19389609] [2012-04-21 00:03:04.817000] [37.5]
IIUC, If you want to simply print the data for each group use:
for key, group in df.groupby(['ItemId', 'Date']):
print(key)
print(group)
This prints:
(1, '2012-04-21')
Id Timestamp Data ItemId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-21
(2, '2012-04-21')
Id Timestamp Data ItemId Date
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-21
Try to do a dual selector by adding each one into a set of parenthesis and in between add a ampersand &:
df[(df[“Date”] == “2020-04-21”)& (df[“ItemId”] == 2)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.