[英]Web Scraping data with BS4 - Python
我一直在尝试从以下代码导出 web 抓取的文档。
import pandas as pd
import requests
from bs4 import BeautifulSoup
url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')
cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find('tr'): ##for row in soup.find("tbody").find_all('tr'):
col = row.find("td")
Name = col[0].text
Exchange = col[1].text
Sector = col[2].text
cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True)
但我收到错误“TypeError:‘int’object 不可订阅”。 谁能帮我解决这个问题?
您需要知道.find()
和.find_all()
之间的区别。
唯一的区别是 find_all() 返回一个包含单个结果的列表,而 find() 只返回结果。
由于您使用的是col = row.find_all("td")
,因此col
不是列表。 所以你得到这个错误 - 'TypeError: 'int' object is not subscriptable'
由于您需要遍历所有<tr>
并在每个<tr>
中转入<td>
,因此您必须使用find_all()
。
你可以试试这个。
import pandas as pd
import requests
from bs4 import BeautifulSoup
url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"
data = requests.get(url).text
soup = BeautifulSoup(data, 'lxml')
cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find_all('tr'):
col = row.find_all("td")
Company_Name = col[0].text
Exchange_code = col[1].text
Industry = col[2].text
cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True)
Name ... Sector
0 Abans Electricals PLC (ABAN.N0000) ... Housewares
1 Abans Finance PLC (AFSL.N0000) ... Finance Companies
2 Access Engineering PLC (AEL.N0000) ... Construction
3 ACL Cables PLC (ACL.N0000) ... Industrial Electronics
4 ACL Plastics PLC (APLA.N0000) ... Industrial Products
.. ... ... ...
145 Lanka Hospital Corp. PLC (LHCL.N0000) ... Healthcare Provision
146 Lanka IOC PLC (LIOC.N0000) ... Specialty Retail
147 Lanka Milk Foods (CWE) PLC (LMF.N0000) ... Food Products
148 Lanka Realty Investments PLC (ASCO.N0000) ... Real Estate Developers
149 Lanka Tiles PLC (TILE.N0000) ... Building Materials/Products
[150 rows x 3 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.