Pandas json_normalize 和 JSON 展平錯誤

Question

這里的熊貓新手很難理解為什么我無法完全壓平我從 API 收到的 JSON。 我需要一個 Dataframe 以及 API 返回的所有數據，但是我需要擴展所有嵌套數據並給它自己的列以便我能夠使用它。

我收到的JSON如下：

[
   {
      "query":{
         "id":"1596487766859-3594dfce3973bc19",
         "name":"test"      
      },
      "webPage":{
         "inLanguages":[
            {
               "code":"en"            
            }
         ]      
      },
      "product":{
         "name":"Test",
         "description":"Test2",
         "mainImage":"image1.jpg",
         "images":[
            "image2.jpg",
            "image3.jpg"         
         ],
         "offers":[
            {
               "price":"45.0",
               "currency":"€"            
            }
         ],
         "probability":0.9552192      
      }
    }
  ]

在沒有任何附加參數的情況下運行pd.json_normalize(data)會在 product.offers 列中顯示嵌套值 price 和 currency。 當我嘗試使用以下內容將它們分成自己的列時：

pd.json_normalize(data,record_path=['product',meta['product',['offers']]])

我最終得到以下錯誤： f"{js} has non list value {result} for path {spec}. "

任何幫助將非常感激。

Answer 1

我用過這種技術幾次

執行初始pd.json_normalize()以發現列
通過檢查這個和原始JSON來構建meta參數。 NB 可能的索引超出范圍
您只能請求一個list驅動器record_path參數
一些技巧product/images是一個list ，因此它被命名為0 。 重命名它
做了一個笛卡爾積來合並分解列表中的兩個不同的數據框。 它不是那么穩定


data = [{'query': {'id': '1596487766859-3594dfce3973bc19', 'name': 'test'},
  'webPage': {'inLanguages': [{'code': 'en'}]},
  'product': {'name': 'Test',
   'description': 'Test2',
   'mainImage': 'image1.jpg',
   'images': ['image2.jpg', 'image3.jpg'],
   'offers': [{'price': '45.0', 'currency': '€'}],
   'probability': 0.9552192}}]

# build default to get column names
df = pd.json_normalize(data)
# from column names build the list that gets sent to meta param
mymeta = [[s for s in c.split(".")] for c in df.columns ]
# exclude lists from meta - this will fail
mymeta = [l for l in mymeta if not isinstance(data[0][l[0]][l[1]], list)]

# you can build df from either of the product lists NOT both
df1 = pd.json_normalize(data, record_path=[["product","offers"]], meta=mymeta)
df2 = pd.json_normalize(data, record_path=[["product","images"]], meta=mymeta).rename(columns={0:"image"})
# want them together - you can merge them.  note columns heavily overlap so remove most columns from df2
df1.assign(foo=1).merge(
    df2.assign(foo=1).drop(columns=[c for c in df2.columns if c!="image"]), on="foo").drop(columns="foo")

Pandas json_normalize 和 JSON 展平錯誤

問題描述

1 個解決方案

解決方案1
0 已采納 2020-08-04 18:06:16

Pandas json_normalize 和 JSON 展平錯誤

問題描述

1 個解決方案

解決方案1 0 已采納 2020-08-04 18:06:16

解決方案1
0 已采納 2020-08-04 18:06:16