简体   繁体   中英

Web Scraping data with BS4 - Python

I have been trying to export a web scraped document from the below code.

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find('tr'): ##for row in soup.find("tbody").find_all('tr'):
    col = row.find("td")
    Name = col[0].text
    Exchange = col[1].text
    Sector = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True) 

but I am receiving an error 'TypeError: 'int' object is not subscriptable'. Can anyone help me to crack this out?

You need to know the difference between .find() and.find_all() .

The only difference is that find_all() returns a list containing the single result, and find() just returns the result.

Since you are using col = row.find_all("td") , col is not a list. So you get this error - 'TypeError: 'int' object is not subscriptable'

Since you need to iterate over all the <tr> and inturn <td> inside every <tr> , you have to use find_all() .

You can try this out.

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'lxml')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find_all('tr'):
    col = row.find_all("td")
    Company_Name = col[0].text
    Exchange_code = col[1].text
    Industry = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True) 
                                          Name  ...                       Sector
0           Abans Electricals PLC (ABAN.N0000)  ...                   Housewares
1               Abans Finance PLC (AFSL.N0000)  ...            Finance Companies
2           Access Engineering PLC (AEL.N0000)  ...                 Construction
3                   ACL Cables PLC (ACL.N0000)  ...       Industrial Electronics
4                ACL Plastics PLC (APLA.N0000)  ...          Industrial Products
..                                         ...  ...                          ...
145      Lanka Hospital Corp. PLC (LHCL.N0000)  ...         Healthcare Provision
146                 Lanka IOC PLC (LIOC.N0000)  ...             Specialty Retail
147     Lanka Milk Foods (CWE) PLC (LMF.N0000)  ...                Food Products
148  Lanka Realty Investments PLC (ASCO.N0000)  ...       Real Estate Developers
149               Lanka Tiles PLC (TILE.N0000)  ...  Building Materials/Products

[150 rows x 3 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM