繁体   English   中英

Web 使用 BS4 抓取数据 - Python

[英]Web Scraping data with BS4 - Python

我一直在尝试从以下代码导出 web 抓取的文档。

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find('tr'): ##for row in soup.find("tbody").find_all('tr'):
    col = row.find("td")
    Name = col[0].text
    Exchange = col[1].text
    Sector = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True) 

但我收到错误“TypeError:‘int’object 不可订阅”。 谁能帮我解决这个问题?

您需要知道.find().find_all()之间的区别。

唯一的区别是 find_all() 返回一个包含单个结果的列表,而 find() 只返回结果。

由于您使用的是col = row.find_all("td") ,因此col不是列表。 所以你得到这个错误 - 'TypeError: 'int' object is not subscriptable'

由于您需要遍历所有<tr>并在每个<tr>中转入<td> ,因此您必须使用find_all()

你可以试试这个。

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'lxml')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find_all('tr'):
    col = row.find_all("td")
    Company_Name = col[0].text
    Exchange_code = col[1].text
    Industry = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True) 
                                          Name  ...                       Sector
0           Abans Electricals PLC (ABAN.N0000)  ...                   Housewares
1               Abans Finance PLC (AFSL.N0000)  ...            Finance Companies
2           Access Engineering PLC (AEL.N0000)  ...                 Construction
3                   ACL Cables PLC (ACL.N0000)  ...       Industrial Electronics
4                ACL Plastics PLC (APLA.N0000)  ...          Industrial Products
..                                         ...  ...                          ...
145      Lanka Hospital Corp. PLC (LHCL.N0000)  ...         Healthcare Provision
146                 Lanka IOC PLC (LIOC.N0000)  ...             Specialty Retail
147     Lanka Milk Foods (CWE) PLC (LMF.N0000)  ...                Food Products
148  Lanka Realty Investments PLC (ASCO.N0000)  ...       Real Estate Developers
149               Lanka Tiles PLC (TILE.N0000)  ...  Building Materials/Products

[150 rows x 3 columns]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM