简体   繁体   English

从网页抓取表格到 python

[英]Scraping tables from a webpage into python

I am learning Spanish & to help me learn the different verbs and their conjugations I am making some flash cards to use on my phone.我正在学习西班牙语并帮助我学习不同的动词及其变位,我正在制作一些 flash 卡以在我的手机上使用。

I am trying to scrape the data from a web page here is example page for one verb .我正在尝试从 web 页面中抓取数据,这里是一个动词的示例页面 On the page there are a few tables, I am interested in the first five (Present, Future, Imperfect, Preterite & Conditional) near the top.在页面上有几个表格,我对靠近顶部的前五个(现在、未来、不完美、过时和条件)感兴趣。

I have heard the BeautifulSoup is good for these types of projects.我听说 BeautifulSoup 适用于这些类型的项目。 However when I use the prettify method I can't find the tables in the text anywhere?但是,当我使用美化方法时,我在任何地方都找不到文本中的表格? I think I'm missing something, how can I get these tables in python?我想我遗漏了一些东西,我怎样才能在 python 中获得这些表?

 import requests
 from bs4 import BeautifulSoup
 import re

 URL = 'https://www.linguasorb.com/spanish/verbs/conjugation/tener.html'
 page = requests.get(URL)
 soup = BeautifulSoup(page.content, 'html.parser')
 txt = soup.prettify()

You're loading the wrong url.您正在加载错误的 url。 Remove the ".html" from the URL variable and you will be able to find the tables (they're actually lists) in the output: soup.find_all('div', class_='vPos')从 URL 变量中删除“.html”,您将能够在 output 中找到表(它们实际上是列表): soup.find_all('div', class_='vPos')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM