简体   繁体   中英

Use map with return value of soup.select to get table headers

I am using BeautifulSoup to parse a webpage.

from bs4 import BeautifulSoup as Soup

How do I get an array of strings from table headers like

soup = Soup(html, features="html.parser")
headers = soup.select("#StatusGrid thead tr").map(lambda x: x.text)

Select does not return a list. Can I inspect the type that it returns?

Without any example of your html it is hard to help, you may can provide an url or some html. However, to generate your list with map() put the list inside.

Alternativ use a list comprehension :

[x.text for x in soup.select("#StatusGrid thead tr")]

Checking the type will give you bs4.element.ResultSet :

type(soup.select("#StatusGrid thead tr")) 

Example

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
<thead>
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
</thead>
</table>
'''

soup = Soup(html, features="html.parser")
list(map(lambda x: x.text, soup.select("#StatusGrid thead tr")))

Output

['1', '2', '3']

Use .stripped_strings

From the Docs

.stripped_strings yields Python strings that have had whitespace stripped.

Since it returns a generator you can convert it to a list to have an array of strings.

Here is how to use it.

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
    <thead>
        <tr><td>Heading-1</td></tr>
        <tr><td>Heading-2</td></tr>
        <tr><td>Heading-3</td></tr>
    </thead>
    <tbody>
    </tbody>
</table>
'''

soup = Soup(html, features="lxml")
t = soup.find('table', {'id': 'StatusGrid'}).find('thead')
print(list(t.stripped_strings))
['Heading-1', 'Heading-2', 'Heading-3']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM