Use map with return value of soup.select to get table headers

Question

I am using BeautifulSoup to parse a webpage.

from bs4 import BeautifulSoup as Soup

How do I get an array of strings from table headers like

soup = Soup(html, features="html.parser")
headers = soup.select("#StatusGrid thead tr").map(lambda x: x.text)

Select does not return a list. Can I inspect the type that it returns?

Answer 1

Without any example of your html it is hard to help, you may can provide an url or some html. However, to generate your list with map() put the list inside.

Alternativ use a list comprehension :

[x.text for x in soup.select("#StatusGrid thead tr")]

Checking the type will give you bs4.element.ResultSet :

type(soup.select("#StatusGrid thead tr"))

Example

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
<thead>
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
</thead>
</table>
'''

soup = Soup(html, features="html.parser")
list(map(lambda x: x.text, soup.select("#StatusGrid thead tr")))

Output

['1', '2', '3']

Answer 2

Use `.stripped_strings`

From the Docs

.stripped_strings yields Python strings that have had whitespace stripped.

Since it returns a generator you can convert it to a list to have an array of strings.

Here is how to use it.

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
    <thead>
        <tr><td>Heading-1</td></tr>
        <tr><td>Heading-2</td></tr>
        <tr><td>Heading-3</td></tr>
    </thead>
    <tbody>
    </tbody>
</table>
'''

soup = Soup(html, features="lxml")
t = soup.find('table', {'id': 'StatusGrid'}).find('thead')
print(list(t.stripped_strings))

['Heading-1', 'Heading-2', 'Heading-3']

Use map with return value of soup.select to get table headers

Question

2 answers

solution1
1 ACCPTED 2021-11-03 09:16:05

Example

Output

solution2
1 2021-11-03 09:50:25

Use `.stripped_strings`

Use map with return value of soup.select to get table headers

Question

2 answers

solution1 1 ACCPTED 2021-11-03 09:16:05

Example

Output

solution2 1 2021-11-03 09:50:25

Use .stripped_strings

solution1
1 ACCPTED 2021-11-03 09:16:05

solution2
1 2021-11-03 09:50:25

Use `.stripped_strings`