[英]Extracting values from dictionary list in pandas dataframe
我有以下 pandas dataframe:
pd.DataFrame({'keys': {3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition', 31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType', 50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand', 70: 'productType'}, 'values': {3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'}, 14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925, 20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'}, 29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'}, 31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33: {'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'}, 41: 'chinese-laundry', 44: 'tshirts', 50: 925, 52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'}, 61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'}, 63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry', 70: 'jackets'}})
看起來像這樣:
keys values
3 brandId 925
5 price {'currencyName': 'GBP', 'priceAmount': '50.00'...
14 sizes {'id': 4, 'name': 'UK 4', 'quantity': 1}
18 brandId 925
20 price {'currencyName': 'GBP', 'priceAmount': '11.00'...
29 sizes {'id': 3, 'name': 'S', 'quantity': 1}
30 condition {'id': 'used_like_new', 'name': 'Like new'}
...
我想將屬於其鍵的特定值的字典展平。 例如,只獲取priceAmount
的值,以及任何其他字典鍵中name
的值。
所以預期 output:
keys values
3 brandId 925
5 price 50.00
14 sizes UK 4
18 brandId 925
20 price 11.00
29 sizes S
30 condition Like new}
我可以用下面的方法來做到這一點,如果我有更多的東西要更換的話,這會花費很長時間!
price_data = []
for price in data[data['keys'].str.contains('price', na=False)].values:
price_data.append(price[1]['priceAmount'])
condition_data = []
for condition in data[data['keys'].str.contains('condition', na=False)].values:
condition_data.append(condition[1]['name'])
age_data = []
for age in data[data['keys'].str.contains('age', na=False)].values:
age_data.append(age[1]['name'])
sizes_data = []
for sizes in data[data['keys'].str.contains('sizes', na=False)].values:
sizes_data.append(sizes[1]['name'])
colour_data = []
for colour in data[data['keys'].str.contains('colour', na=False)].values:
colour_data.append(colour[1]['name'])
#replace the values
data=data.replace(data[data['keys'].str.contains('price', na=False)]['values'].values, price_data)
data=data.replace(data[data['keys'].str.contains('condition', na=False)]['values'].values, condition_data)
data=data.replace(data[data['keys'].str.contains('age', na=False)]['values'].values, age_data)
data=data.replace(data[data['keys'].str.contains('sizes', na=False)]['values'].values, sizes_data)
data=data.replace(data[data['keys'].str.contains('colour', na=False)]['values'].values, colour_data)
有沒有更快更流暢的替代方案?
也許,如果您可以訪問用於制作df
的dict
,則可以改用json_normalize()
。
例如:
d = {
'keys': {
3: 'brandId', 5: 'price', 14: 'sizes', 18: 'brandId', 20: 'price', 29: 'sizes', 30: 'condition',
31: 'condition', 32: 'colour', 33: 'age', 36: 'brand', 40: 'colour', 41: 'brand', 44: 'productType',
50: 'brandId', 52: 'price', 61: 'sizes', 62: 'condition', 63: 'colour', 64: 'age', 67: 'brand',
70: 'productType',
},
'values': {
3: 925, 5: {'currencyName': 'GBP', 'priceAmount': '50.00', 'nationalShippingCost': '3.00'},
14: {'id': 4, 'name': 'UK 4', 'quantity': 1}, 18: 925,
20: {'currencyName': 'GBP', 'priceAmount': '11.00', 'nationalShippingCost': '0.00'},
29: {'id': 3, 'name': 'S', 'quantity': 1}, 30: {'id': 'used_like_new', 'name': 'Like new'},
31: {'id': 'brand_new', 'name': 'Brand new'}, 32: {'id': 'multi', 'name': 'Multi'}, 33:
{'id': 'modern', 'name': 'Modern'}, 36: 'chinese-laundry', 40: {'id': 'white', 'name': 'White'},
41: 'chinese-laundry', 44: 'tshirts', 50: 925,
52: {'currencyName': 'GBP', 'priceAmount': '20.00', 'nationalShippingCost': '3.00'},
61: {'id': 11, 'name': 'M', 'quantity': 1}, 62: {'id': 'brand_new', 'name': 'Brand new'},
63: {'id': 'black', 'name': 'Black'}, 64: {'id': '90s', 'name': '90s'}, 67: 'chinese-laundry',
70: 'jackets',
},
}
請注意,這個 dict 的形式有點不尋常, keys
和values
在頂層分開。 為了將它們放在一起,以便可以使用json_normalize()
,我們希望將它們合並,以便所有記錄都是完整的(鍵和值)。 這會分解出每個數字鍵。 請注意,由於我假設值中有許多記錄(可能是字典列表?),因此您必須對它們中的每一個都執行此操作。
>>> mod_d = {d['keys'][i]: v for i, v in d['values'].items()}
>>> mod_d
{'brandId': 925,
'price': {'currencyName': 'GBP',
'priceAmount': '20.00',
'nationalShippingCost': '3.00'},
'sizes': {'id': 11, 'name': 'M', 'quantity': 1},
'condition': {'id': 'brand_new', 'name': 'Brand new'},
'colour': {'id': 'black', 'name': 'Black'},
'age': {'id': '90s', 'name': '90s'},
'brand': 'chinese-laundry',
'productType': 'jackets'}
有了這個,我們現在可以使用json_normalize()
:
>>> df = pd.json_normalize(mod_d)
>>> df
brandId brand productType price.currencyName price.priceAmount \
0 925 chinese-laundry jackets GBP 20.00
price.nationalShippingCost sizes.id sizes.name sizes.quantity \
0 3.00 11 M 1
condition.id condition.name colour.id colour.name age.id age.name
0 brand_new Brand new black Black 90s 90s
,
pandas 字符串方法允許訪問列表/元組/字典中的值:
df['val'] = np.where(df['keys'] == 'price',
df['values'].str['priceAmount'],
df['values'].str['name'])
df['val'] = df['val'].fillna(df['values'])
keys values val
3 brandId 925 925
5 price {'currencyName': 'GBP', 'priceAmount': '50.00'... 50.00
14 sizes {'id': 4, 'name': 'UK 4', 'quantity': 1} UK 4
18 brandId 925 925
20 price {'currencyName': 'GBP', 'priceAmount': '11.00'... 11.00
29 sizes {'id': 3, 'name': 'S', 'quantity': 1} S
30 condition {'id': 'used_like_new', 'name': 'Like new'} Like new
31 condition {'id': 'brand_new', 'name': 'Brand new'} Brand new
32 colour {'id': 'multi', 'name': 'Multi'} Multi
33 age {'id': 'modern', 'name': 'Modern'} Modern
36 brand chinese-laundry chinese-laundry
40 colour {'id': 'white', 'name': 'White'} White
41 brand chinese-laundry chinese-laundry
44 productType tshirts tshirts
50 brandId 925 925
52 price {'currencyName': 'GBP', 'priceAmount': '20.00'... 20.00
61 sizes {'id': 11, 'name': 'M', 'quantity': 1} M
62 condition {'id': 'brand_new', 'name': 'Brand new'} Brand new
63 colour {'id': 'black', 'name': 'Black'} Black
64 age {'id': '90s', 'name': '90s'} 90s
67 brand chinese-laundry chinese-laundry
70 productType jackets jackets
另一種選擇是使用簡單的列表推導:
df['values'] = [i.get('priceAmount') or i.get('name') if isinstance(i, dict) else i for i in df['values'].tolist()]
Output:
keys values
3 brandId 925
5 price 50.00
14 sizes UK 4
18 brandId 925
20 price 11.00
29 sizes S
30 condition Like new
31 condition Brand new
32 colour Multi
33 age Modern
36 brand chinese-laundry
40 colour White
41 brand chinese-laundry
44 productType tshirts
50 brandId 925
52 price 20.00
61 sizes M
62 condition Brand new
63 colour Black
64 age 90s
67 brand chinese-laundry
70 productType jackets
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.