简体   繁体   中英

Pandas json_normalize and JSON flattening error

A panda newbie here that's struggling to understand why I'm unable to completely flatten a JSON I receive from an API. I need a Dataframe with all the data that is returned by the API, however I need all nested data to be expanded and given it's own columns for me to be able to use it.

The JSON I receive is as follows:

[
   {
      "query":{
         "id":"1596487766859-3594dfce3973bc19",
         "name":"test"      
      },
      "webPage":{
         "inLanguages":[
            {
               "code":"en"            
            }
         ]      
      },
      "product":{
         "name":"Test",
         "description":"Test2",
         "mainImage":"image1.jpg",
         "images":[
            "image2.jpg",
            "image3.jpg"         
         ],
         "offers":[
            {
               "price":"45.0",
               "currency":"€"            
            }
         ],
         "probability":0.9552192      
      }
    }
  ]

Running pd.json_normalize(data) without any additional parameters shows the nested values price and currency in the product.offers column. When I try to separate these out into their own columns with the following:

pd.json_normalize(data,record_path=['product',meta['product',['offers']]])

I end up with the following error: f"{js} has non list value {result} for path {spec}. "

Any help would be much appreciated.

I've used this technique a few times

  1. do initial pd.json_normalize() to discover the columns
  2. build meta parameter by inspecting this and the original JSON . NB possible index out of range here
  3. you can only request one list drives record_path param
  4. a few tricks product/images is a list so it gets named 0 . rename it
  5. did a Cartesian product to merge two different data frames from breaking down lists. It's not so stable

data = [{'query': {'id': '1596487766859-3594dfce3973bc19', 'name': 'test'},
  'webPage': {'inLanguages': [{'code': 'en'}]},
  'product': {'name': 'Test',
   'description': 'Test2',
   'mainImage': 'image1.jpg',
   'images': ['image2.jpg', 'image3.jpg'],
   'offers': [{'price': '45.0', 'currency': '€'}],
   'probability': 0.9552192}}]

# build default to get column names
df = pd.json_normalize(data)
# from column names build the list that gets sent to meta param
mymeta = [[s for s in c.split(".")] for c in df.columns ]
# exclude lists from meta - this will fail
mymeta = [l for l in mymeta if not isinstance(data[0][l[0]][l[1]], list)]

# you can build df from either of the product lists NOT both
df1 = pd.json_normalize(data, record_path=[["product","offers"]], meta=mymeta)
df2 = pd.json_normalize(data, record_path=[["product","images"]], meta=mymeta).rename(columns={0:"image"})
# want them together - you can merge them.  note columns heavily overlap so remove most columns from df2
df1.assign(foo=1).merge(
    df2.assign(foo=1).drop(columns=[c for c in df2.columns if c!="image"]), on="foo").drop(columns="foo")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM