简体   繁体   中英

How to select data from JSON to Excel with Pandas

Good evening!

I am working with a project where I want to extract JSON data from a website and then import it to a Excel/CSV file. I am web scraping the webpage with Selenium and uses JSON to json_loads & json_normalize. When I am using the json_normalize function, all data does not appear when I print it out. What I wanna do is select some data and make it look good.

The JSON data from the website:

{
  "getUrl": "/395012/Organization/pase10001",
  "className": "Organization",
  "data": {
    "name": "ICA Supermarket",
    "organizationNumber": "556589-4341",
    "centralPhoneNumber": {
      "value": "044-310010",
      "normalized": "+4644310010",
      "className": "PhoneNumber",
      "isEmpty": false
    },
    "faxPhoneNumber": {
      "value": null,
      "normalized": null,
      "className": "PhoneNumber",
      "isEmpty": true
    },
    "website": "www.ica.se",
    "email": {
      "value": "kundkontakt.tollarp@supermarket.ica.se",
      "className": "Email"
    },
    "dateLastModified": "/Date(1621342946134+0200)/",
    "visitAddress": {
      "street": "Polgatan 5",
      "zipCode": "298 32",
      "city": "TOLLARP",
      "countryCode": ""
    },
    "postalAddress": {
      "street": "Box 24",
      "zipCode": "298 21",
      "city": "TOLLARP",
      "countryCode": ""
    },
    "responsibleCoworker": null,
    "integrationid": "",
    "customFields": [],
    "relation": 0,
    "tags": [],
    "headOffice": null,
    "corporateGroup": null,
    "sharedBody": {
      "vatNumber": "SE556589434101",
      "lineOfBusiness": "Livsmedelshandel med brett sortiment, ej varuhus eller stormarknad",
      "businessDescription": "Bolaget skall som medlem i ICA-förbundet bedriva detaljhandelsrörelse med dagligvaror och annan därmed förenlig verksamhet.",
      "legalForm": "Aktiebolag",
      "dateOfRegistration": "2000-04-06",
      "legalName": "Superlivs i Tollarp AB",
      "rating": null,
      "numberOfSubsidaries": 0,
      "numberOfEmployeesRange": "20 - 49",
      "numberOfEmployeesWorkSite": "20 - 49"

I want to select certain information of this JSON data and export it to a Excel document. I want the Excel to have categories in this order for example:

Company organization
name1 000000000000

What I have tried so far:

res = self.driver.find_element_by_tag_name("pre").text
        data = json.loads(res)

        xd = pd.json_normalize(data])
        xd.to_excel("output.xlsx")

Screenshot of Excel document

I am new with Python and trying to learn as much as possible. It would really make my day if you can help me how I can get further with this project!

Also you could try convtools library, this will help you build converters dynamically. Cheatsheet is here .

import json
from convtools import conversion as c

data = json.loads(res)
converter = c.list_comp({
    "company": c.item("data", "name"),
    "org_number": c.item("data", "organizationNumber"),
}).gen_converter(debug=True)  # install "black" to see formatted sources

prepared_data = converter([json.loads(res)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM