[英]Web Scraping - Python; Writing to a CSV
I am trying to write data from a website. 我正在尝试从网站上写数据。 The data is listed as a table in HTML with the tags '' listing when a new block of data is listed in the rankings and '' for each descriptive item about the element in the ranking. 数据在HTML中列为一个表格,当排名中列出新的数据块时,标签为“列表”,排名中的每个描述性项目的列表为“”。 The list is a rank of top 500 computers, listed 1-100 with each 1, 2, 3, 4, etc. item listed by '' and the each characteristic of the computer listed as '' (it's storage, max power, etc). 该列表是前500名计算机的排名,列出1-100,每个1,2,3,4等项目由''列出,并且计算机的每个特征列为''(它的存储,最大功率等) )。
Here is my code: 这是我的代码:
# read the data from a URL
url = requests.get("https://www.top500.org/list/2018/06/")
url.status_code
url.content
# parse the URL using Beauriful Soup
soup = BeautifulSoup(url.content, 'html.parser')
filename = "computerRank10.csv"
f = open(filename,"w")
headers = "Rank, Site, System, Cores, RMax, RPeak, Power\n"
f.write(headers)
for record in soup.findAll('tr'):
# start building the record with an empty string
tbltxt = ""
tbltxt = tbltxt + data.text + ";"
tbltxt = tbltxt.replace('\n', ' ')
tbltxt = tbltxt.replace(',', '')
# f.write(tbltxt[0:-1] + '\n')
f.write(tbltxt + '\n')
f.close()
I'm getting nothing and my CSV file is always blank 我什么也没得到,我的CSV文件总是空白
Try the below script. 试试下面的脚本。 It should fetch you all the data across and write the same in a csv file: 它应该获取所有数据并在csv文件中写入相同的内容:
import csv
import requests
from bs4 import BeautifulSoup
link = "https://www.top500.org/list/2018/06/?page={}"
def get_data(link):
for url in [link.format(page) for page in range(1,6)]:
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("table.table tr"):
td = [item.get_text(strip=True) for item in items.select("th,td")]
writer.writerow(td)
if __name__ == '__main__':
with open("tabularitem.csv","w",newline="") as infile: #if encoding issue comes up then replace with ('tabularitem.csv', 'w', newline="", encoding="utf-8")
writer = csv.writer(infile)
get_data(link)
You should use csv
module on the Python standard library. 您应该在Python标准库上使用csv
模块。
Here is a simpler solution: 这是一个更简单的解决方案:
import requests
import csv
from bs4 import BeautifulSoup as bs
url = requests.get("https://www.top500.org/list/2018/06")
soup = bs(url.content, 'html.parser')
filename = "computerRank10.csv"
csv_writer = csv.writer(open(filename, 'w'))
for tr in soup.find_all("tr"):
data = []
# for headers ( entered only once - the first time - )
for th in tr.find_all("th"):
data.append(th.text)
if data:
print("Inserting headers : {}".format(','.join(data)))
csv_writer.writerow(data)
continue
for td in tr.find_all("td"):
if td.a:
data.append(td.a.text.strip())
else:
data.append(td.text.strip())
if data:
print("Inserting data: {}".format(','.join(data)))
csv_writer.writerow(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.