I have the below code working perfectly to dynamically search for a specific text within a HTML table source code and pull the nextSibling of the row where the specific text was found.
Current Code
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
# Find xxxxxxx (row-by-row) and split trailing zeros
row = soup.find_all('td', string="xxxxxxx")
for r in row:
LE = r.nextSibling
while LE.name != 'td' and LE is not None:
LE = LE.nextSibling
The main issue I am having (it is probably super easy and I have just been staring at this for so long now) is that I need to assign the nextSibling to the LE variable.
LE is formatted as "001234" where I need to strip the leading zeros to have "1234" as the variable.
If I print the variable as print(LE.text[2:6])
the result is correct. Implemented into the code as, LE = LE.nextSibling.text[2:6]
does not produce anything.
I have tried the following statements, but none work and am hoping for guidance.
LE = LE.nextSibling.text[2:6]
&
LE = LE.text[2:6]
I need this to be assigned to a variable after extracting to utilize the variable later on within my script. I appreciate the help in advance!
EDIT --> included source code:
<tr>
<td class='label' nowrap title="xxxxxxx">TEXT TO FIND</td>
<td class='attribute'>001234</td>
</tr>
You can use next_sibling
twice, and than use strip()
to remove 0
:
from bs4 import BeautifulSoup
html = """<tr>
<td class='label' nowrap title="xxxx">TEXT TO FIND</td>
<td class='attribute'>001234</td>
</tr>"""
soup = BeautifulSoup(html, "html.parser")
for tag in soup.select(".label"):
le = ''.join([t.strip("0") for t in tag.next_sibling.next_sibling])
print(tag.text)
print(le)
Output:
TEXT TO FIND
1234
Change:
!=
to ==
row = soup.find_all('td', string="xxxxxx")
for r in row:
LE = r.nextSibling
LE = LE.text[2:6]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.