我想使用 Python 从网页中提取 CSV 文件。网页抓取

Question

I want to take the.csv file, or the.xlsx file from this webpage.我想从该网页获取 .csv 文件或 .xlsx 文件。 I thought about using webscraping, using beautifulsoup, but this seems inefficient.我想过使用网络抓取，使用 beautifulsoup，但这似乎效率低下。 I want to be able to write a function that, when this webpage is called, the code locates the links to the CSV files and returns the CSV file to me.我希望能够编写一个 function ，当调用此网页时，代码会定位到 CSV 文件的链接并将 CSV 文件返回给我。

This is so that I can then follow an analysis on the CSV file.这样我就可以对 CSV 文件进行分析。

Please could someone help me out here!请有人可以在这里帮助我！

Here's the link: https://data.london.gov.uk/dataset/recorded_crime_rates这是链接： https://data.london.gov.uk/dataset/recorded_crime_rates

Answer 1

Use the urllib library to get the source of a webpage, .使用urllib库获取网页的源代码，.

This seems to work:这似乎有效：

import urllib.request, urllib.error, urllib.parse

url = 'https://data.london.gov.uk/dataset/recorded_crime_rates'
csvfile = r"C:\Tmp\CrimeRates.csv"

#open main page
response = urllib.request.urlopen(url)
webContent = response.read()
wc = str(webContent)

#get csv URL
i = wc.find(r"crime%20rates.csv")
i2 = wc.find("/download/recorded_crime_rates", i-200)
csvURL = "https://data.london.gov.uk" + wc[i2:i+17]
print(csvURL)

#get csv
csvresp = urllib.request.urlopen(csvURL)
csvdata = str(csvresp.read())
print(len(csvdata), "bytes")

#save csv to file
print("Saving To", csvfile)
f = open(csvfile,"w")
f.write(csvdata.replace(r"\r\n","\n"))
f.close()

我想使用 Python 从网页中提取 CSV 文件。网页抓取

问题描述

1 个解决方案

解决方案1
0 2020-07-21 15:07:51

我想使用 Python 从网页中提取 CSV 文件。 网页抓取

问题描述

1 个解决方案

解决方案1 0 2020-07-21 15:07:51

我想使用 Python 从网页中提取 CSV 文件。网页抓取

解决方案1
0 2020-07-21 15:07:51