使用 pandas 将 csv 文件转换为类似 JSON 的嵌套文件

Question

I'm a pandas/Python newbie and was wondering if you could help me with the following issue.我是 pandas/Python 新手，想知道您是否可以帮助我解决以下问题。

Consider the following csv file:考虑以下 csv 文件：

country,continent,year,productA,productB
NLD,Europe,2012,1000,500
NLD,Europe,2013,100,50
NLD,Europe,2014,150,40
NLD,Europe,2015,200,70
CAN,America,2012,30,40
CAN,America,2013,50,90
CAN,America,2014,200,2000
CAN,America,2015,20,30
JPN,Asia,2012,100,2000
JPN,Asia,2013,400,100
JPN,Asia,2014,300,3000
JPN,Asia,2015,400,370

I would like to rewrite it as a JSON-like file:我想将其重写为类似 JSON 的文件：

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
    productB: {
      2012: '500',
      2013: '50',
      2004: '40',
      2005: '70',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
    productB: {
      2012: '40',
      2013: '90',
      2004: '200',
      2005: '30',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
    productB: {
      2012: '2000',
      2013: '100',
      2004: '3000',
      2005: '370',
    },
  },
]

This question is similar but I was not able to adapt the answer to my needs due to my limited knowledge.这个问题很相似，但由于我的知识有限，我无法根据自己的需要调整答案。 By using the answer to the said question, I can write this snippet:通过使用上述问题的答案，我可以编写以下代码段：

json = (df.groupby(['country','continent'], as_index=False)
.apply(lambda x: dict(zip(x.year,x.productA)))
.reset_index()
.rename(columns={0:'productA'})
.to_json(orient='records'))

, which results in ，这导致

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
  },
]

I would be most grateful if you could help me reach the desirable output (inclusion of productB) and suggest resources that I could use to improve my data wrangling skills using Pandas.如果您能帮助我达到理想的 output（包含 productB）并建议我可以使用 Pandas 来提高我的数据整理技能的资源，我将不胜感激。

Thank you!谢谢！

Answer 1

Notice that pd.df_to_dict() does almost what you want (even the orientation is right - see t he documentation for other options. To get the country continent tuple, just make a loop请注意， pd.df_to_dict()几乎可以满足您的要求（即使方向是正确的-有关其他选项，请参见文档。要获取国家/地区元组，只需循环

dictlist=[]
for i, j in df.groupby(['country', 'continent'):
    thedict =  j.to_dict()
    thedict["country"]= i[0]
    thedict["continent"] = i[1]
    dictlist.append(thedict)

I am pretty sure that some small variation on this will do what you want.我很确定这方面的一些小变化会满足你的需求。

使用 pandas 将 csv 文件转换为类似 JSON 的嵌套文件

问题描述

1 个解决方案

解决方案1
0 2020-05-07 02:18:51

使用 pandas 将 csv 文件转换为类似 JSON 的嵌套文件

问题描述

1 个解决方案

解决方案1 0 2020-05-07 02:18:51

解决方案1
0 2020-05-07 02:18:51