I am slightly new to scraping data using python
and am attempting to pull data off this page into a pandas
dataframe
with the column
headers as shown in that page.
Right now I have the following code that allows me to pull the data off the page but I can't quite figure out the next steps to get the data in the format I need.
import requests
url = 'https://mspotrace.org.my/Opmc_list/getCBbyfilters'
r = requests.get(url)
page = requests.get(url).text
You can read the tables from the url directly using the pandas
api.
>>> import pandas as pd
>>> url = 'https://mspotrace.org.my/Opmc_list'
>>> df = pd.read_html(url)
>>> df[0]
pandas
api, read_html
reads all the tables and returns a list of dataframes In your case there is only one table in that url. So the desired dataframe is at index 0
EDIT
Try this
>>> data = json.loads(page)
>>> df = pd.DataFrame(data)
>>> df
draw recordsTotal recordsFiltered data
0 0 2654 2654 [OPMC31001, Apave Malaysia Sdn Bhd, Part 3, Ka...
1 0 2654 2654 [OPMC31002, Apave Malaysia Sdn Bhd, Part 3, Ko...
2 0 2654 2654 [OPMC31003, Apave Malaysia Sdn Bhd, Part 3, Ko...
3 0 2654 2654 [OPMC31004, Apave Malaysia Sdn Bhd, Part 3, Ko...
4 0 2654 2654 [OPMC31005, Apave Malaysia Sdn Bhd, Part 3, Ko...
... ... ... ... ...
2649 0 2654 2654 [SCCS2333, Trans Certification Interntional Sd...
2650 0 2654 2654 [SCCS2351, TUV Rheinland Malaysia Sdn. Bhd., S...
2651 0 2654 2654 [SCCS1636, DQS Certification (M) Sdn Bhd, SCCS...
2652 0 2654 2654 [SCCS2906, TUV NORD (MALAYSIA) SDN BHD, SCCS, ...
2653 0 2654 2654 [SCCS02085, BSI Services Malaysia Sdn Bhd, SCC...
[2654 rows x 4 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.