[英]how to write this json multi-lists into seperate columns in python pandas
我有这个JSON文件:
{"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"}}]}
现在我得到带有名称,编号,默认价格,价格的表,但是prices列就像三行,需要从键p "p": "63", "currency": "RMB".
读取价格63 "p": "63", "currency": "RMB".
但是我希望在单独的列中得到一个包含价格和货币的表,我使用了以下方法:
ndf = pd.concat([x的价格为pd.Series(x),轴= 1)
但是只是得到一个错误的答案:
0 1
DZ {"p": "232", "currency": "CAD"} {"p": "62", "currency": "RMB"}
AU {"p": "233", "currency": "CAD"} {"p": "63","currency":"RMB"}
无论如何要纠正这一点,以便我可以获得预期的输出?
Name Number Code currency
name1 number1 AU AUD
name1 number1 DZ RMB
非常感谢!!
您可以在defaultPrice
列上使用apply(pd.Series)
将其拆分为单独的列,然后将其重新连接到原始数据框。
prices = {"a": [{"Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"}},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"}}]}
ndf = pd.DataFrame(prices['a'])
pd.concat([ndf, ndf['defaultPrice'].apply(pd.Series)], axis=1).drop('defaultPrice', axis=1)
但是,您的prices
列仍然是词典列表。 但是由于您没有提到要如何处理,所以我将其保留为原样(不包括在输出中)。
输出:
Name number p currency
name1 number1 232 CAD
name2 number2 233 CAD
json字符串:
j = {"a": [{ "Name": "name1",
"number": "number1",
"defaultPrice": {"p": "232", "currency": "CAD"},
"prices": {"DZ": {"p": "62", "currency": "RMB"},
"AU": {"p": "73", "currency": "AUD"},
"lg": "en"
}
},
{"Name": "name2",
"number": "number2",
"defaultPrice": {"p": "233", "currency": "CAD"},
"prices": {"DZ": {"p": "63", "currency": "RMB"},
"US": {"p": "72", "currency": "USD"},
"Lg": "en"
}
}
]}
获得所需输出的代码:
country_codes = set()
for d in j['a']:
c = d['prices'].keys()
country_codes.update(c)
country_codes = sorted([i for i in country_codes if not i in ['lg','Lg']])
country_codes
meta = ['Name','number'] + [['prices',c,'p'] for c in country_codes] + [['prices',c,'currency'] for c in country_codes]
df = json_normalize(j['a'], record_path = 'prices', meta = meta,errors='ignore')
df = df.rename(columns={0: 'countryCode'})
df = df[~df['countryCode'].isin(['lg','Lg'])]
for idx, row in df.iterrows():
country = row['countryCode']
col_price = df.columns[df.columns.str.contains(country+'.p')][0]
col_currency = df.columns[df.columns.str.contains(country+'.currency')][0]
price = row[col_price]
currency = row[col_currency]
df.loc[idx,'price'] = price
df.loc[idx,'currency'] = currency
df = df[['Name','number','countryCode', 'currency', 'price']]
df
这给出:
Name number countryCode currency price
0 name1 number1 DZ RMB 62
1 name1 number1 AU AUD 73
3 name2 number2 DZ RMB 63
4 name2 number2 US USD 72
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.