简体   繁体   中英

How to scrape data from a specific website

I am trying to scrape data from a website to make an analysis for practicing. I am having some issues with a specific website. The website is about the police report in the Seattle area. I have read plenty of articles and could not get the answer. The URL is: https://data.seattle.gov/Public-Safety/real-time-911/nvqc-w7eg

I know that I should use beautiful soup and try to find a keyword to do the search afterwards transform the type to text. However, I keep getting None.

import requests

URL = 'https://data.seattle.gov/Public-Safety/real-time-911/nvqc-w7eg'
page = requests.get(URL) 
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content)

My goal is to transform the table into a .csv file. Could someone help me please?

try

soup = BeautifulSoup(page.text, "html.parser")

For creating BeautifulSoup object:

soup = BeautifulSoup(page.text, "lxml")

or,

soup = BeautifulSoup(page.text, "html.parser")

or,

soup = BeautifulSoup(page.content, 'html.parser')

I think, you should use lxml because it has good performance.

Parse the HTML with either lxml or html.parser . Some have advantages over the other including the fact that lxml is very fast.

URL = 'https://data.seattle.gov/Public-Safety/real-time-911/nvqc-w7eg'
page = requests.get(URL) 
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.text, "html.parser") #or lxml, htmlparser is just example

Instead of putting it directly inside without the string, put it in with the string so that Python knows that you're dealing with parsers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM