[英]I want to extract the CSV file from the webpage using Python. WEBSCRAPING
I want to take the.csv file, or the.xlsx file from this webpage.我想从该网页获取 .csv 文件或 .xlsx 文件。 I thought about using webscraping, using beautifulsoup, but this seems inefficient.
我想过使用网络抓取,使用 beautifulsoup,但这似乎效率低下。 I want to be able to write a function that, when this webpage is called, the code locates the links to the CSV files and returns the CSV file to me.
我希望能够编写一个 function ,当调用此网页时,代码会定位到 CSV 文件的链接并将 CSV 文件返回给我。
This is so that I can then follow an analysis on the CSV file.这样我就可以对 CSV 文件进行分析。
Please could someone help me out here!请有人可以在这里帮助我!
Here's the link: https://data.london.gov.uk/dataset/recorded_crime_rates这是链接: https://data.london.gov.uk/dataset/recorded_crime_rates
Use the urllib library to get the source of a webpage, .使用urllib库获取网页的源代码,.
This seems to work:这似乎有效:
import urllib.request, urllib.error, urllib.parse
url = 'https://data.london.gov.uk/dataset/recorded_crime_rates'
csvfile = r"C:\Tmp\CrimeRates.csv"
#open main page
response = urllib.request.urlopen(url)
webContent = response.read()
wc = str(webContent)
#get csv URL
i = wc.find(r"crime%20rates.csv")
i2 = wc.find("/download/recorded_crime_rates", i-200)
csvURL = "https://data.london.gov.uk" + wc[i2:i+17]
print(csvURL)
#get csv
csvresp = urllib.request.urlopen(csvURL)
csvdata = str(csvresp.read())
print(len(csvdata), "bytes")
#save csv to file
print("Saving To", csvfile)
f = open(csvfile,"w")
f.write(csvdata.replace(r"\r\n","\n"))
f.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.