Extract text from within parenthesis into pandas dataframe

Question

I am slightly new to scraping data using python and am attempting to pull data off this page into a pandas dataframe with the column headers as shown in that page.

Right now I have the following code that allows me to pull the data off the page but I can't quite figure out the next steps to get the data in the format I need.

import requests

url = 'https://mspotrace.org.my/Opmc_list/getCBbyfilters'

r = requests.get(url)
page = requests.get(url).text

Answer 1

You can read the tables from the url directly using the pandas api.

>>> import pandas as pd
>>> url = 'https://mspotrace.org.my/Opmc_list'
>>> df = pd.read_html(url)
>>> df[0]

pandas api, read_html reads all the tables and returns a list of dataframes In your case there is only one table in that url. So the desired dataframe is at index 0

EDIT

Try this

>>> data = json.loads(page)
>>> df = pd.DataFrame(data)
>>> df
      draw  recordsTotal  recordsFiltered                                               data
0        0          2654             2654  [OPMC31001, Apave Malaysia Sdn Bhd, Part 3, Ka...
1        0          2654             2654  [OPMC31002, Apave Malaysia Sdn Bhd, Part 3, Ko...
2        0          2654             2654  [OPMC31003, Apave Malaysia Sdn Bhd, Part 3, Ko...
3        0          2654             2654  [OPMC31004, Apave Malaysia Sdn Bhd, Part 3, Ko...
4        0          2654             2654  [OPMC31005, Apave Malaysia Sdn Bhd, Part 3, Ko...
...    ...           ...              ...                                                ...
2649     0          2654             2654  [SCCS2333, Trans Certification Interntional Sd...
2650     0          2654             2654  [SCCS2351, TUV Rheinland Malaysia Sdn. Bhd., S...
2651     0          2654             2654  [SCCS1636, DQS Certification (M) Sdn Bhd, SCCS...
2652     0          2654             2654  [SCCS2906, TUV NORD (MALAYSIA) SDN BHD, SCCS, ...
2653     0          2654             2654  [SCCS02085, BSI Services Malaysia Sdn Bhd, SCC...

[2654 rows x 4 columns]

Extract text from within parenthesis into pandas dataframe

Question

1 answers

solution1
1 ACCPTED 2019-12-16 04:28:51

Extract text from within parenthesis into pandas dataframe

Question

1 answers

solution1 1 ACCPTED 2019-12-16 04:28:51

solution1
1 ACCPTED 2019-12-16 04:28:51