Web 使用 BS4 抓取数据 - Python

Question

I have been trying to export a web scraped document from the below code.我一直在尝试从以下代码导出 web 抓取的文档。

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find('tr'): ##for row in soup.find("tbody").find_all('tr'):
    col = row.find("td")
    Name = col[0].text
    Exchange = col[1].text
    Sector = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True)

but I am receiving an error 'TypeError: 'int' object is not subscriptable'.但我收到错误“TypeError：‘int’object 不可订阅”。 Can anyone help me to crack this out?谁能帮我解决这个问题？

Answer 1

You need to know the difference between .find() and.find_all() .您需要知道.find()和.find_all()之间的区别。

The only difference is that find_all() returns a list containing the single result, and find() just returns the result.唯一的区别是 find_all() 返回一个包含单个结果的列表，而 find() 只返回结果。

Since you are using col = row.find_all("td") , col is not a list.由于您使用的是col = row.find_all("td") ，因此col不是列表。 So you get this error - 'TypeError: 'int' object is not subscriptable'所以你得到这个错误 - 'TypeError: 'int' object is not subscriptable'

Since you need to iterate over all the <tr> and inturn <td> inside every <tr> , you have to use find_all() .由于您需要遍历所有<tr>并在每个<tr>中转入<td> ，因此您必须使用find_all() 。

You can try this out.你可以试试这个。

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.marketwatch.com/tools/markets/stocks/country/sri-lanka/1"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'lxml')

cse = pd.DataFrame(columns=["Name", "Exchange", "Sector"])
for row in soup.find('tbody').find_all('tr'):
    col = row.find_all("td")
    Company_Name = col[0].text
    Exchange_code = col[1].text
    Industry = col[2].text
    cse = cse.append({"Name":Company_Name,"Exchange":Exchange_code,"Sector":Industry}, ignore_index=True)

                                          Name  ...                       Sector
0           Abans Electricals PLC (ABAN.N0000)  ...                   Housewares
1               Abans Finance PLC (AFSL.N0000)  ...            Finance Companies
2           Access Engineering PLC (AEL.N0000)  ...                 Construction
3                   ACL Cables PLC (ACL.N0000)  ...       Industrial Electronics
4                ACL Plastics PLC (APLA.N0000)  ...          Industrial Products
..                                         ...  ...                          ...
145      Lanka Hospital Corp. PLC (LHCL.N0000)  ...         Healthcare Provision
146                 Lanka IOC PLC (LIOC.N0000)  ...             Specialty Retail
147     Lanka Milk Foods (CWE) PLC (LMF.N0000)  ...                Food Products
148  Lanka Realty Investments PLC (ASCO.N0000)  ...       Real Estate Developers
149               Lanka Tiles PLC (TILE.N0000)  ...  Building Materials/Products

[150 rows x 3 columns]

Web 使用 BS4 抓取数据 - Python

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-08-13 14:30:46

Web 使用 BS4 抓取数据 - Python

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-08-13 14:30:46

解决方案1
0 已采纳 2021-08-13 14:30:46