Python - Extracting data from this Html tag using BS4, instead of getting None

Question

This is my code:

html = '''
<td class="ClassName class" width="60%">Data I want to extract<span lang=EN- 
UK style="font-size:12pt;font-family:'arial'"></span></td>
'''


soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').string)

It returns None. I think it has to do with that span tag which is empty. I think it goes into that span tag, and returns those contents? So I either want to delete that span tag, or stop as soon as it finds the 'Data I want to extract', or tell it to ignore empty tags

If there are no empty tags inside 'td' it actually works.

Is there a way to ignore empty tags in general and go one step back? Instead of ignoring this specific span tag?

Sorry if this is too elementary, but I spent a fair amount of time searching.

Answer 1

Use .text property, not .string :

html = '''
<td class="ClassName class" width="60%">Data I want to extract<span lang=EN-
UK style="font-size:12pt;font-family:'arial'"></span></td>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').text)

Output:

Data I want to extract

Answer 2

Use .text :

>>> soup.find('td').text
u'Data I want to extract'

Python - Extracting data from this Html tag using BS4, instead of getting None

Question

2 answers

solution1
2 ACCPTED 2018-07-12 15:23:40

solution2
2 2018-07-12 15:24:16

Python - Extracting data from this Html tag using BS4, instead of getting None

Question

2 answers

solution1 2 ACCPTED 2018-07-12 15:23:40

solution2 2 2018-07-12 15:24:16

solution1
2 ACCPTED 2018-07-12 15:23:40

solution2
2 2018-07-12 15:24:16