简体   繁体   中英

Convert csv file into nested JSON-like file using pandas

I'm a pandas/Python newbie and was wondering if you could help me with the following issue.

Consider the following csv file:

country,continent,year,productA,productB
NLD,Europe,2012,1000,500
NLD,Europe,2013,100,50
NLD,Europe,2014,150,40
NLD,Europe,2015,200,70
CAN,America,2012,30,40
CAN,America,2013,50,90
CAN,America,2014,200,2000
CAN,America,2015,20,30
JPN,Asia,2012,100,2000
JPN,Asia,2013,400,100
JPN,Asia,2014,300,3000
JPN,Asia,2015,400,370

I would like to rewrite it as a JSON-like file:

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
    productB: {
      2012: '500',
      2013: '50',
      2004: '40',
      2005: '70',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
    productB: {
      2012: '40',
      2013: '90',
      2004: '200',
      2005: '30',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
    productB: {
      2012: '2000',
      2013: '100',
      2004: '3000',
      2005: '370',
    },
  },
]

This question is similar but I was not able to adapt the answer to my needs due to my limited knowledge. By using the answer to the said question, I can write this snippet:

json = (df.groupby(['country','continent'], as_index=False)
.apply(lambda x: dict(zip(x.year,x.productA)))
.reset_index()
.rename(columns={0:'productA'})
.to_json(orient='records'))

, which results in

[
  {
    country: 'NLD',
    continent: 'Europe',
    productA: {
      2012: '1000',
      2013: '100',
      2004: '150',
      2005: '200',
    },
  },
  {
    country: 'CAN',
    continent: 'America',
    productA: {
      2012: '30',
      2013: '50',
      2004: '200',
      2005: '20',
    },
  },
  {
    country: 'JPN',
    continent: 'Asia',
    productA: {
      2012: '100',
      2013: '400',
      2004: '300',
      2005: '400',
    },
  },
]

I would be most grateful if you could help me reach the desirable output (inclusion of productB) and suggest resources that I could use to improve my data wrangling skills using Pandas.

Thank you!

Notice that pd.df_to_dict() does almost what you want (even the orientation is right - see t he documentation for other options. To get the country continent tuple, just make a loop

dictlist=[]
for i, j in df.groupby(['country', 'continent'):
    thedict =  j.to_dict()
    thedict["country"]= i[0]
    thedict["continent"] = i[1]
    dictlist.append(thedict)

I am pretty sure that some small variation on this will do what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM