[英]Convert csv file into nested JSON-like file using pandas
I'm a pandas/Python newbie and was wondering if you could help me with the following issue.我是 pandas/Python 新手,想知道您是否可以帮助我解决以下问题。
Consider the following csv file:考虑以下 csv 文件:
country,continent,year,productA,productB
NLD,Europe,2012,1000,500
NLD,Europe,2013,100,50
NLD,Europe,2014,150,40
NLD,Europe,2015,200,70
CAN,America,2012,30,40
CAN,America,2013,50,90
CAN,America,2014,200,2000
CAN,America,2015,20,30
JPN,Asia,2012,100,2000
JPN,Asia,2013,400,100
JPN,Asia,2014,300,3000
JPN,Asia,2015,400,370
I would like to rewrite it as a JSON-like file:我想将其重写为类似 JSON 的文件:
[
{
country: 'NLD',
continent: 'Europe',
productA: {
2012: '1000',
2013: '100',
2004: '150',
2005: '200',
},
productB: {
2012: '500',
2013: '50',
2004: '40',
2005: '70',
},
},
{
country: 'CAN',
continent: 'America',
productA: {
2012: '30',
2013: '50',
2004: '200',
2005: '20',
},
productB: {
2012: '40',
2013: '90',
2004: '200',
2005: '30',
},
},
{
country: 'JPN',
continent: 'Asia',
productA: {
2012: '100',
2013: '400',
2004: '300',
2005: '400',
},
productB: {
2012: '2000',
2013: '100',
2004: '3000',
2005: '370',
},
},
]
This question is similar but I was not able to adapt the answer to my needs due to my limited knowledge.这个问题很相似,但由于我的知识有限,我无法根据自己的需要调整答案。 By using the answer to the said question, I can write this snippet:
通过使用上述问题的答案,我可以编写以下代码段:
json = (df.groupby(['country','continent'], as_index=False)
.apply(lambda x: dict(zip(x.year,x.productA)))
.reset_index()
.rename(columns={0:'productA'})
.to_json(orient='records'))
, which results in ,这导致
[
{
country: 'NLD',
continent: 'Europe',
productA: {
2012: '1000',
2013: '100',
2004: '150',
2005: '200',
},
},
{
country: 'CAN',
continent: 'America',
productA: {
2012: '30',
2013: '50',
2004: '200',
2005: '20',
},
},
{
country: 'JPN',
continent: 'Asia',
productA: {
2012: '100',
2013: '400',
2004: '300',
2005: '400',
},
},
]
I would be most grateful if you could help me reach the desirable output (inclusion of productB) and suggest resources that I could use to improve my data wrangling skills using Pandas.如果您能帮助我达到理想的 output(包含 productB)并建议我可以使用 Pandas 来提高我的数据整理技能的资源,我将不胜感激。
Thank you!谢谢!
Notice that pd.df_to_dict()
does almost what you want (even the orientation is right - see t he documentation for other options. To get the country continent tuple, just make a loop请注意,
pd.df_to_dict()
几乎可以满足您的要求(即使方向是正确的-有关其他选项,请参见文档。要获取国家/地区元组,只需循环
dictlist=[]
for i, j in df.groupby(['country', 'continent'):
thedict = j.to_dict()
thedict["country"]= i[0]
thedict["continent"] = i[1]
dictlist.append(thedict)
I am pretty sure that some small variation on this will do what you want.我很确定这方面的一些小变化会满足你的需求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.