[英]Web Scraping tables and data with Python Beautifulsoup
I have scraped the data from this table, using Python-Beautifulsoup, from all the pages for this website and into a dictionary, as seen from the code below.我使用 Python-Beautifulsoup 从这个表中抓取了这个网站的所有页面的数据,并放入了一个字典中,如下面的代码所示。
However, I am also trying to scrape for each company which has it's own separate page ,into that dictionary also.但是,我也试图将每家拥有自己单独页面的公司也刮到该字典中。
import requests
from bs4 import BeautifulSoup
from pprint import pprint
company_data = []
for i in range(1, 3):
page = requests.get(f'https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm{i}?')
soup = BeautifulSoup(page.text, "lxml")
row_info = soup.select('div.accordion_heading.panel-group.s_list_table')
for row_info in row_info:
comapny_info = {}
comapny_info['Name'] = row_info.select_one('div.col_1 a').text.strip()
pprint(company_data)
I have just done with only for 2M company I believe that helps.我刚刚完成了仅适用于 2M 公司的工作,我相信这会有所帮助。
import requests
from bs4 import BeautifulSoup
res=requests.get("https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm").text
soup=BeautifulSoup(res,'html.parser')
comapny_info={}
comapny_info['Profile'] = soup.select('div.text-desc-members')
if len(soup.select('div.text-desc-members'))==0:
comapny_info['Profile'] = soup.select('div.list-sub')[0].text.strip()
comapny_info['ACOP']=[item['href'] for item in soup.select(".table.table-striped a.files")]
comapny_info['QuestionAnswer']=["Question:" + q.text.strip() + " Answer:" +a.text.strip() for q ,a in zip(soup.select("div.list-reports .m_question"),soup.select("div.list-reports .m_answer")) ]
print(comapny_info)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.