[英]Scraping a list of urls using beautifulsoup and convert data to csv
我是 Python 的新手。 以下问题:
我有一个要从中抓取数据的 url 列表。 我不知道我的代码有什么问题,我无法从所有 url 中检索结果。 该代码仅抓取第一个 url 而不是 rest。 如何成功抓取列表中所有 url 中的数据(标题、信息、描述、应用程序)?
如果问题 1 有效,如何将数据转换为 CSV 文件?
这是代码:
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
urlList = ["url1","url2","url3"...lists of urls.......]
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = BeautifulSoup(html.read(),"html5lib")
# Scraping
def getTitle():
for title in soup.find('h2', class_="xx").text:
print(title)
def getInfo():
for info in soup.find('ul', class_="j-k-i").text:
print(info)
def getDescription():
for description in soup.find('div', class_="b-d").text:
print(description)
def getApplication():
for application in soup.find('div', class_="g-b bm-b-30").text:
print(application)
for soups in soup():
getTitle()
getInfo()
getDescription()
getApplication()
尝试以下方法:
import requests
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup
import lxml
import pandas as pd
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
def getTitle(soup):
for title in soup.find('h2', class_="xx").text:
print(title)
def getInfo(soup):
for info in soup.find('ul', class_="j-k-i").text:
print(info)
def getDescription(soup):
for description in soup.find('div', class_="b-d").text:
print(description)
def getApplication(soup):
for application in soup.find('div', class_="g-b bm-b-30").text:
print(application)
urlList = ["url1","url2","url3"...lists of urls.......]
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = BeautifulSoup(html.read(),"html5lib")
getTitle(soup)
getInfo(soup)
getDescription(soup)
getApplication(soup)
这会将当前的soup
传递给每个 function 使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.