I have a df=
A=
[period store item
1 32 'A'
1 34 'A'
1 32 'B'
1 34 'B'
2 42 'X'
2 44 'X'
2 42 'Y'
2 44 'Y']
I want to find all the stores for an item in that period preferably in a dictionary like this:
dicta = {1: {'A': (32, 34),'B': (32, 34)}, 2: {'X': (42, 44),'Y': (42, 44)}}
EDIT For @JEZRAEL
Actual df
RTYPE PERIOD_ID STORE_ID MKT MTYPE RGROUP RZF RXF
0 MKT 317 13178 Kiosks_11 CELL NaN NaN NaN
1 MKT 306 11437 Kiosks_11 CELL NaN NaN NaN
2 MKT 306 12236 Kiosks_11 CELL NaN NaN NaN
3 MKT 312 11024 Kiosks_11 CELL NaN NaN NaN
4 MKT 307 13010 Kiosks_11 CELL NaN NaN NaN
5 MKT 307 12723 Kiosks_11 CELL NaN NaN NaN
6 MKT 306 14218 Kiosks_11 CELL NaN NaN NaN
7 MKT 306 13547 Kiosks_11 CELL NaN NaN NaN
8 MKT 316 12396 Kiosks_11 CELL NaN NaN NaN
9 MKT 306 10778 Cafes_638 CELL NaN NaN NaN
10 MKT 317 11230 Kiosks_11 CELL NaN NaN NaN
11 MKT 315 13630 Kiosks_11 CELL NaN NaN NaN
12 MKT 314 14113 Bars_13 CELL NaN NaN NaN
13 MKT 314 12089 Kiosks_11 CELL NaN NaN NaN
Here, PERIOD_ID AND STORE_ID and MKT are periods,stores and items respectively. The edit suggested by @jezrael is returning me this for the above df.
d1={306L: (8207L, 8209L .... 8210L, 8211L),307L:( 8215L, 8219L ... 8233L, 8235L), 308: (8238L, 8239L....8244L, 8252L) ..k:(v) ..}
(Note: Edited to make it look small as the original dictionary is huge)
For the sample data it is working fine as expected but for this dataframe it isnt.
Edit for @jezrael as a Minimal, Reproducible Example.
df=
RTYPE PERIOD_ID STORE_ID MKT MTYPE RGROUP RZF RXF
0 MKT 20171411 3102300001 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
1 MKT 20171411 3102300002 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
2 MKT 20171411 3104001193 PM Provision CELL NaN NaN NaN
3 MKT 20171411 3104001193 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
4 MKT 20171411 3104001193 Provision including MM CELL NaN NaN NaN
5 MKT 20171411 3104001641 PM Provision CELL NaN NaN NaN
6 MKT 20171411 3104001641 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
7 MKT 20171411 3104001641 Provision including MM CELL NaN NaN NaN
8 MKT 20171411 3104001682 PM Provision CELL NaN NaN NaN
9 MKT 20171411 3104001682 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
10 MKT 20171411 3104001682 Provision including MM CELL NaN NaN NaN
11 MKT 20171412 3104001682 Alcohol CELL NaN NaN NaN
12 MKT 20171412 3104001682 Fish CELL NaN NaN NaN
13 MKT 20171412 3104001684 Alcohol CELL NaN NaN NaN
14 MKT 20171412 3104001684 Fish CELL NaN NaN NaN
Current Ouput as per @jezraels code
{20171411L: ('Provision including MM', 'PM Provision', 'PM KA+PM PROV+SMKT+PETRO'), 20171412L: ('Fish', 'Alcohol')}
Expected Output :
{20171411L: ('Provision including MM', 'PM Provision'), 20171412L: ('Fish', 'Alcohol')}
For Period 20171411L , 'Provision including MM', 'PM Provision' MKT's are duplicate because they have the same set of store_ids whereas for period 20171412L , 'Fish', 'Alcohol' MKT's are duplicate because they have the same set of store_ids.
I am new to Pandas but have some basic knowledge about Python. Really not sure how I can achieve this. Any help will be great.
Create MultiIndex Series
and in dictionary comprehension create nested dictionary:
s = df.groupby(['period','item'])['store'].apply(tuple)
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{1: {'A': (32, 34), 'B': (32, 34)}, 2: {'X': (42, 44), 'Y': (42, 44)}}
EDIT: You can grouping by period
and convert item
to sets and then to tuples:
d1 = {k:tuple(set(v)) for k, v in df.groupby('period')['item']}
print (d1)
{1: ('A', 'B'), 2: ('X', 'Y')}
d1 = df.groupby('period')['item'].apply(lambda x: tuple(set(x))).to_dict()
print (d1)
{1: ('A', 'B'), 2: ('X', 'Y')}
You can do with a dict comprehension:
dicta = {p: g.groupby('item')['store'].apply(tuple).to_dict()
for p, g in df.groupby('period')}
[out]
{1: {"'A'": (32, 34), "'B'": (32, 34)}, 2: {"'X'": (42, 44), "'Y'": (42, 44)}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.