简体   繁体   中英

Convert JSON file into a custom table using Python Pandas

Hello StackOverflow Community,

I am kind of new to this whole JSON file format and also a beginner in Python. I have a custom JSON file whose data is nested and sub-nested. I am trying to convert it into a tabular format using Python.

I exactly don't even know how to proceed, I referred some questions here and they flattened the file and then did some coding in Python to bring it into shape. I tried and it didn't exactly go well.

I will add the JSON file and the expected output in CSV format. Please have a look at and it and let me know of any ideas that I can use to make it work.

In the below JSON file, for each Company, we have specific Product IDs that have data and some are null. The ProductIDs which have data have an assigned value in predictions and the rest of them will be NULL.

I would like to also get some suggestions if we could modify the JSON data by adding a couple more identifiers or removing any identifiers would help us acheive the end output? (For example, add Product Value and Product Description for all of the sub-categories whereever we only have a value)

[
    {
        "predictions": {
            "CC__108": 0.948093,
            "CC__111": 0.897565
        },
        "CompanyID": "CI10001",
        "SHAP_ACT_CC__1": null,
        "SHAP_ACT_CC__2": null,
        "SHAP_ACT_CC__108": {
            "Product Category": "Toys",
            "Board Games": {
                "Monopoly": 35.99,
                "The Game of Life": 39.99,
                "The Clue": 20.45
            },
            "Soft Toys": {
                "Bear": {
                    "Product Value": 5.78,
                    "Product Description": "A soft bear toy"
                }
            },
            "Electronic Toys": {
                "Digital pet": 59.99,
                "Entertainment robot": 100.99
            },
            "Puzzle": {
                "Baloon Puzzle": 10.99
            },
            "Rubik's Cube": {
                "3x3 cube": {
                    "Product Value": 5.99,
                    "Product Description": "3x3 rubik's cube"
                }
            }
        },
        "SHAP_ACT_CC__109": null,
        "SHAP_ACT_CC__110": null,
        "SHAP_ACT_CC__111": {
            "Product Category": "Books",
            "Action and Adventure": {
                "Life of Pi": 14.99,
                "The Call of the Wild": 9.99
            },
            "Classics": {
                "Little Women": {
                    "Product Value": 10.99,
                    "Product Description": "Illustrated Edition"
                },
                "Beloved": {
                    "Product Value": 12.99,
                    "Product Description": "Winner of the Nobel Prize"
                }
            },
            "Comics": {
                "Watchmen": 14.99,
                "Avengers": 15.99
            },
            "Fantasy": {
                "Ninth House": 18.99
            },
            "Historical": {}
        },
        "SHAP_ACT_CC__115": null,
        "SHAP_ACT_CC__116": null
    },
    {
        "predictions": {
            "CC__124": 0.81234,
            "CC__85": 0.78943
        },
        "CompanyID": "CI10002",
        "SHAP_ACT_CC__18": null,
        "SHAP_ACT_CC__24": null,
        "SHAP_ACT_CC__124": {
            "Product Category": "Vehicles",
            "Military Aircraft": {
                "Attack Airplanes": 10000.99,
                "Bomber Airplanes": 15000.99
            },
            "Airplanes": {
                "Cargo Airplanes": {
                    "Product Value": 20000.99,
                    "Product Description": "Cargo Transport"
                }
            },
            "ATV": {},
            "Automobiles": {},
            "Bicycles": {}
        },
        "SHAP_ACT_CC__134": null,
        "SHAP_ACT_CC__135": null,
        "SHAP_ACT_CC__85": {
            "Product Category": "Boats",
            "Fishing Boats": {
                "Smudger": 25000.99,
                "Campion": 30000.99
            },
            "Dinghy Boats": {
                "Lowe": {
                    "Product Value": 10000.99,
                    "Product Description": "lowes dinghy boat"
                },
                "Pond King": {
                    "Product Value": 8000.99,
                    "Product Description": "king of the pond"
                }
            },
            "Deck Boats": {
                "Sea Ark": 45000.99
            },
            "Bowrider Boats": {},
            "House Boats": {
                "World Cat": {
                    "Product Value": 15000.99,
                    "Product Description": "3 bedroom house boat"
                }
            }
        },
        "SHAP_ACT_CC__149": null,
        "SHAP_ACT_CC__150": null
    }
]

Output Data (CSV)

CompanyID,Product ID,Product Category,Product Type,Product Name,Product Value,Product Description
CI10001,SHAP_ACT_CC__108,Toys,Board Games,Monopoly,35.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Game of Life,39.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Clue,20.45,
CI10001,SHAP_ACT_CC__108,Toys,Soft Toys,Bear,5.78,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Digital pet,59.99,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Entertainment robot,100.99,
CI10001,SHAP_ACT_CC__108,Toys,Puzzle,Baloon Puzzle,10.99,
CI10001,SHAP_ACT_CC__108,Toys,Rubik's Cube,3x3 cube,5.99,3x3 rubik's cube
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,Life of Pi,14.99,
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,The Call of the Wild,9.99,
CI10001,SHAP_ACT_CC__111,Books,Classics,Little Women,10.99,Illustrated Edition
CI10001,SHAP_ACT_CC__111,Books,Classics,Beloved,12.99,Winner of the Nobel Prize
CI10001,SHAP_ACT_CC__111,Books,Comics,Watchmen,14.99,
CI10001,SHAP_ACT_CC__111,Books,Comics,Avengers,15.99,
CI10001,SHAP_ACT_CC__111,Books,Fantasy,Ninth House,18.99,
CI10001,SHAP_ACT_CC__111,Books,Historical,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Attack Airplanes,10000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Bomber Airplanes,15000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Airplanes,Cargo Airplanes,20000.99,Cargo Transport
CI10002,SHAP_ACT_CC__124,Vehicles,ATV,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Automobiles,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Bicycles,,,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Smudger,25000.99,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Campion,30000.99,
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Lowe,10000.99,lowes dinghy boat
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Pond King,8000.99,king of the pond
CI10002,SHAP_ACT_CC__85,Boats,Deck Boats,Sea Ark,45000.99,
CI10002,SHAP_ACT_CC__85,Boats,Bowrider Boats,,,
CI10002,SHAP_ACT_CC__85,Boats,House Boats,World Cat,15000.99,3 bedroom house boat

输出 CSV 文件

Thank you for your help. Any ideas or additional links would be helpful and please let me know if I need to add any additional information or change the format of the post.

There's no straightforward way to handle such idiosyncratic JSON schema. You need to work with the schema from the innermost items all the way up, handling each type of object.

Product information

The innermost field has the product details, which can be a single number representing the value

14.99

or an object with keys for value and description

{
  "Product Value": 10.99,
  "Product Description": "Illustrated Edition"
}

You can handle it like this:

def process_product(ptype):
    if isinstance(ptype, dict):
        value = ptype['Product Value']
        description = ptype['Product Description']
    else:
        value = ptype
        description = None

    return value, description

Product Categories

Then come product categories, each with a special Product Category key and multiple keys for product types

{
  "Product Category": "Books",
  "Action and Adventure": {
    "Life of Pi": 14.99,
    "The Call of the Wild": 9.99
  },
  "Classics": {...}
}

Which you can process with the following function:

def process_category(category):
    for key in category:
        if key == 'Product Category':
            continue
        else:
            for product_type, product_data in category[key].items():
                yield (
                    category['Product Category'],
                    key,
                    product_type,
                    *process_product(product_data),
                )

Companies

The final object in the hierarchy is the company and each has a CompanyID key and several product categories

{
  "CompanyID": "CI10001",
  "SHAP_ACT_CC__1": null,
  "SHAP_ACT_CC__108": {
    "Product Category": "Toys",
    "Board Games": {...}
  }
}

And the function works similarly to the previous ones:

def process_company(company):
    for key, data in company.items():
        if key.startswith('SHAP_ACT') and data is not None:
            for category_data in process_category(data):
                yield company['CompanyID'], key, *category_data

Multiple Companies

Now write a function to process all the records in the array:

def process_data(companies):
    for company in companies:
        for company_data in process_company(company):
            yield {
                'Company ID': company_data[0],
                'Product ID': company_data[1],
                'Product Category': company_data[2],
                'Product Type': company_data[3],
                'Product Name': company_data[4],
                'Product Value': company_data[5],
                'Product Description': company_data[6],
            }

Dataframe

pd.DataFrame(list(process_data(data)))

Output

   Company ID        Product ID Product Category          Product Type          Product Name  Product Value        Product Description
0     CI10001  SHAP_ACT_CC__108             Toys           Board Games              Monopoly          35.99                       None
1     CI10001  SHAP_ACT_CC__108             Toys           Board Games      The Game of Life          39.99                       None
2     CI10001  SHAP_ACT_CC__108             Toys           Board Games              The Clue          20.45                       None
3     CI10001  SHAP_ACT_CC__108             Toys             Soft Toys                  Bear           5.78            A soft bear toy
4     CI10001  SHAP_ACT_CC__108             Toys       Electronic Toys           Digital pet          59.99                       None
5     CI10001  SHAP_ACT_CC__108             Toys       Electronic Toys   Entertainment robot         100.99                       None
6     CI10001  SHAP_ACT_CC__108             Toys                Puzzle         Baloon Puzzle          10.99                       None
7     CI10001  SHAP_ACT_CC__108             Toys          Rubik's Cube              3x3 cube           5.99           3x3 rubik's cube
8     CI10001  SHAP_ACT_CC__111            Books  Action and Adventure            Life of Pi          14.99                       None
9     CI10001  SHAP_ACT_CC__111            Books  Action and Adventure  The Call of the Wild           9.99                       None
10    CI10001  SHAP_ACT_CC__111            Books              Classics          Little Women          10.99        Illustrated Edition
11    CI10001  SHAP_ACT_CC__111            Books              Classics               Beloved          12.99  Winner of the Nobel Prize
12    CI10001  SHAP_ACT_CC__111            Books                Comics              Watchmen          14.99                       None
13    CI10001  SHAP_ACT_CC__111            Books                Comics              Avengers          15.99                       None
14    CI10001  SHAP_ACT_CC__111            Books               Fantasy           Ninth House          18.99                       None
15    CI10002  SHAP_ACT_CC__124         Vehicles     Military Aircraft      Attack Airplanes       10000.99                       None
16    CI10002  SHAP_ACT_CC__124         Vehicles     Military Aircraft      Bomber Airplanes       15000.99                       None
17    CI10002  SHAP_ACT_CC__124         Vehicles             Airplanes       Cargo Airplanes       20000.99            Cargo Transport
18    CI10002   SHAP_ACT_CC__85            Boats         Fishing Boats               Smudger       25000.99                       None
19    CI10002   SHAP_ACT_CC__85            Boats         Fishing Boats               Campion       30000.99                       None
20    CI10002   SHAP_ACT_CC__85            Boats          Dinghy Boats                  Lowe       10000.99          lowes dinghy boat
21    CI10002   SHAP_ACT_CC__85            Boats          Dinghy Boats             Pond King        8000.99           king of the pond
22    CI10002   SHAP_ACT_CC__85            Boats            Deck Boats               Sea Ark       45000.99                       None
23    CI10002   SHAP_ACT_CC__85            Boats           House Boats             World Cat       15000.99       3 bedroom house boat

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM