简体   繁体   中英

Extracting HTML tables and store them in separate file

I wrote a code to extract subparts of tables, but I want to extract every tag from the input, and then store them in a separate html file

from bs4 import BeautifulSoup

soup = BeautifulSoup(myInput)
table = soup.find('table', {'class': '*'})

I expect the code to show me all tables containted on the input text, but it outputs an error code because the * is not defined

EDIT : * means every table in the file, like saying *.txt

class is the attribute that you are searching for, but you have to tell soup which class you are using to get table

<table class='HiClass'>
A
</table>
<table class='MiClass'>
B
</table>
<table class='*'>
C
</table>

For instance,

table1 = soup.find('table', {'class': '*'})
table2 = soup.find('table', {'class': 'HiClass'})

You'll get "C" table in table1 and "A" in table2 .

To get all tables, just use

table = soup.findAll('table')

and you will get all elements which use <table> tag or tags try returned as a list

Demo:

import requests
from bs4 import BeautifulSoup

def get_request(url):      
    r = requests.get(url)
    soup = BeautifulSoup(r.content,'html5lib') 
    table = soup.findAll('table')
    return table

url ='https://www.w3schools.com/html/html_tables.asp'
print(get_request(url))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM