简体   繁体   中英

Creating a dictionary in python from a pandas data frame

I have a pandas dataframe with three columns Lot Number , Price and Image Id . I have to create a JSON file with the following format

在此处输入图像描述

{'1200-1300':{'LOT3551': [9082327, 9082329],
              'LOT3293':[982832, 898762, 887654]
             },
 '1300-1400': {'LOT2219': [776542, 119234]
              }
}

where the first level keys in the dictionary ie '1200-1300', '1300-1400' etc. are the price ranges. The keys inside the price ranges are Lot Numbers that belong in the price range and their values are the Values from Image Id column.

I have tried the following code till now

for idx, gid_list in enumerate(df['AV Gid']):
    data = df.iloc[idx]
    lot_no = data['Lot Number']
    price = data['Final Price']
    gids = gid_list.replace("[","").replace("]","").split(",")
    if price >= 1000 and price < 1100:
        pr = '10-11'
    elif price >= 1100 and price < 1200:
        pr = '11-12'
    else:
        continue
    print(pr)
        
    if lot_no in sample_dict[pr]:
        sampe_dict[pr][lot_no].append(gid)
    else:
        #print(pr)
        sample_dict[pr][lot_no] = []

where the sample_dict has keys as the price range. The problem with the above code is that it is filling the other price ranges keys as well.

I would do something like

price_ranges = {'10-11': [1000, 1099], '11-12': [1100, 1199], '0-10': [0, 999]}

sample_dict = dict.fromkeys(price_ranges.keys(), {})

def look_for_range(price, price_ranges=price_ranges):
    for label, (low, high) in price_ranges.items():
        if low <= price <= high:
            return label
    
def compose_range_dict(row, sample_dict = sample_dict):
    range_label = look_for_range(row['PRICE'])
    if range_label is not None:
        sample_dict[range_label].update({row['LOTNUMBER']: row['IMAGE_ID']})

Then

import pandas as pd

# dictionary of lists
testdict = {'LOTNUMBER':['LOT3551', 'LOT3520', 'LOT3574', 'LOT3572'],
            'PRICE': [1250, 1150, 10, 900],
            'IMAGE_ID':[[9082327, 9082328, 9082329],
                        [9081865, 9081866, 9081867], 
                        [9083230, 9083231, 9083232],
                        [9082985, 9082986, 9082988]]}
 
testdf = pd.DataFrame(testdict)

testdf.apply(compose_range_dict, axis = 1)

# >>> sample_dict
# {'10-11': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}, 
# '11-12': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}, 
# '0-10': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}}

If df is your dataframe you could try:

data = {
    f"{p}-{p + 100}": ser.to_dict()
    for p, ser in df.assign(Price=df["Price"].floordiv(100).mul(100))
                    .set_index("Lot Number")
                    .groupby("Price")["Image Id"]
}
  • Replace column Price with its .floordiv(100).mul(100) equivalent
  • Set column Lot Number as index
  • .groupby() the resulting dataframe by column Price , grab the column Image Id as series, and place the results in a dictionary with:
    • strings f"{p}-{p + 100}" as keys ( p is the floored price of the group), and
    • the group series converted to a dictionary as values

Result for

data = {"Lot Number":["LOT1", "LOT2", "LOT3", "LOT4", "LOT5", "LOT6"],
        "Price": [1200, 1250, 10, 20, 30, 1300],
        "Image Id": [list(range(n)) for n in range(1, 7)]}
df = pd.DataFrame(data)
  Lot Number  Price            Image Id
0       LOT1   1200                 [0]
1       LOT2   1250              [0, 1]
2       LOT3     10           [0, 1, 2]
3       LOT4     20        [0, 1, 2, 3]
4       LOT5     30     [0, 1, 2, 3, 4]
5       LOT6   1300  [0, 1, 2, 3, 4, 5]

is

{'0-100': {'LOT3': [0, 1, 2], 'LOT4': [0, 1, 2, 3], 'LOT5': [0, 1, 2, 3, 4]},
 '1200-1300': {'LOT1': [0], 'LOT2': [0, 1]},
 '1300-1400': {'LOT6': [0, 1, 2, 3, 4, 5]}}

You could do the same in one Pandas-go:

data = (
    df.assign(
        Price=df["Price"].floordiv(100).mul(100).map(lambda p: f"{p}-{p + 100}")
    )
    .set_index("Lot Number")
    .groupby("Price")["Image Id"]
    .agg(dict)
    .to_dict()
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM