简体   繁体   English

使用 pandas 将 csv 文件转换为类似 JSON 的嵌套文件

[英]Convert csv file into nested JSON-like file using pandas

I'm a pandas/Python newbie and was wondering if you could help me with the following issue.我是 pandas/Python 新手,想知道您是否可以帮助我解决以下问题。

Consider the following csv file:考虑以下 csv 文件:

country,continent,year,productA,productB
NLD,Europe,2012,1000,500
NLD,Europe,2013,100,50
NLD,Europe,2014,150,40
NLD,Europe,2015,200,70
CAN,America,2012,30,40
CAN,America,2013,50,90
CAN,America,2014,200,2000
CAN,America,2015,20,30
JPN,Asia,2012,100,2000
JPN,Asia,2013,400,100
JPN,Asia,2014,300,3000
JPN,Asia,2015,400,370

I would like to rewrite it as a JSON-like file:我想将其重写为类似 JSON 的文件:

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
    productB: {
      2012: '500',
      2013: '50',
      2004: '40',
      2005: '70',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
    productB: {
      2012: '40',
      2013: '90',
      2004: '200',
      2005: '30',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
    productB: {
      2012: '2000',
      2013: '100',
      2004: '3000',
      2005: '370',
    },
  },
]

This question is similar but I was not able to adapt the answer to my needs due to my limited knowledge.这个问题很相似,但由于我的知识有限,我无法根据自己的需要调整答案。 By using the answer to the said question, I can write this snippet:通过使用上述问题的答案,我可以编写以下代码段:

json = (df.groupby(['country','continent'], as_index=False)
.apply(lambda x: dict(zip(x.year,x.productA)))
.reset_index()
.rename(columns={0:'productA'})
.to_json(orient='records'))

, which results in ,这导致

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
  },
]

I would be most grateful if you could help me reach the desirable output (inclusion of productB) and suggest resources that I could use to improve my data wrangling skills using Pandas.如果您能帮助我达到理想的 output(包含 productB)并建议我可以使用 Pandas 来提高我的数据整理技能的资源,我将不胜感激。

Thank you!谢谢!

Notice that pd.df_to_dict() does almost what you want (even the orientation is right - see t he documentation for other options. To get the country continent tuple, just make a loop请注意, pd.df_to_dict()几乎可以满足您的要求(即使方向是正确的-有关其他选项,请参见文档。要获取国家/地区元组,只需循环

dictlist=[]
for i, j in df.groupby(['country', 'continent'):
    thedict =  j.to_dict()
    thedict["country"]= i[0]
    thedict["continent"] = i[1]
    dictlist.append(thedict)

I am pretty sure that some small variation on this will do what you want.我很确定这方面的一些小变化会满足你的需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM