Good evening!
I am working with a project where I want to extract JSON data from a website and then import it to a Excel/CSV file. I am web scraping the webpage with Selenium and uses JSON to json_loads & json_normalize. When I am using the json_normalize function, all data does not appear when I print it out. What I wanna do is select some data and make it look good.
The JSON data from the website:
{
"getUrl": "/395012/Organization/pase10001",
"className": "Organization",
"data": {
"name": "ICA Supermarket",
"organizationNumber": "556589-4341",
"centralPhoneNumber": {
"value": "044-310010",
"normalized": "+4644310010",
"className": "PhoneNumber",
"isEmpty": false
},
"faxPhoneNumber": {
"value": null,
"normalized": null,
"className": "PhoneNumber",
"isEmpty": true
},
"website": "www.ica.se",
"email": {
"value": "kundkontakt.tollarp@supermarket.ica.se",
"className": "Email"
},
"dateLastModified": "/Date(1621342946134+0200)/",
"visitAddress": {
"street": "Polgatan 5",
"zipCode": "298 32",
"city": "TOLLARP",
"countryCode": ""
},
"postalAddress": {
"street": "Box 24",
"zipCode": "298 21",
"city": "TOLLARP",
"countryCode": ""
},
"responsibleCoworker": null,
"integrationid": "",
"customFields": [],
"relation": 0,
"tags": [],
"headOffice": null,
"corporateGroup": null,
"sharedBody": {
"vatNumber": "SE556589434101",
"lineOfBusiness": "Livsmedelshandel med brett sortiment, ej varuhus eller stormarknad",
"businessDescription": "Bolaget skall som medlem i ICA-förbundet bedriva detaljhandelsrörelse med dagligvaror och annan därmed förenlig verksamhet.",
"legalForm": "Aktiebolag",
"dateOfRegistration": "2000-04-06",
"legalName": "Superlivs i Tollarp AB",
"rating": null,
"numberOfSubsidaries": 0,
"numberOfEmployeesRange": "20 - 49",
"numberOfEmployeesWorkSite": "20 - 49"
I want to select certain information of this JSON data and export it to a Excel document. I want the Excel to have categories in this order for example:
Company | organization |
---|---|
name1 | 000000000000 |
What I have tried so far:
res = self.driver.find_element_by_tag_name("pre").text
data = json.loads(res)
xd = pd.json_normalize(data])
xd.to_excel("output.xlsx")
I am new with Python and trying to learn as much as possible. It would really make my day if you can help me how I can get further with this project!
Also you could try convtools
library, this will help you build converters dynamically. Cheatsheet is here .
import json
from convtools import conversion as c
data = json.loads(res)
converter = c.list_comp({
"company": c.item("data", "name"),
"org_number": c.item("data", "organizationNumber"),
}).gen_converter(debug=True) # install "black" to see formatted sources
prepared_data = converter([json.loads(res)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.