I am trying to extract text from within div tag using BeautifulSoup4 and python. The following html code is stored in a file (example.html)
My HTML:
<table class="NZX1058422900" cols="20" style="border-collapse: collapse; width: 1496px;" cellspacing="0" cellpadding="0" border="0">
<tbody>
<td class="A10dbmytr2499b">
<div class="VWP1058422499" alt="Total Cases: 5 - Level 1, Level 2, or On Hold 2 - Completed" title="Total Cases: 5 - Level 1, Level 2, On Hold 2 - Completed">5/2</div>
</td>
</tbody>
</table>
I want the output to look like below:
Total Cases:
5 - Level 1, Level 2, or On Hold
2 - Completed
So far my code is:
from bs4 import BeautifulSoup
openFile = open("C:\\example.html")
readFile = openFile.read()
soup = BeautifulSoup(readFile, "lxml")
I have tried below code without any success:
soup.find("div", class_="VWP1058422499")
Can anyone help as how above data can be extracted?
alt = soup.find("div", {"class":"VWP1058422499"}).get("alt")
print(alt.text) #or just print(alt)
Expanding the answer from @so1989 as you are also wondering how to print with the format you have specified, I would suggest this approach:
from bs4 import BeautifulSoup
openFile = open("C:\\example.html")
readFile = openFile.read()
soup = BeautifulSoup(readFile, "lxml")
alt = soup.find("div", {"class":"VWP1058422499"}).get("alt").split()
for i, char in enumerate(alt):
if char == '-':
alt[i-2] = alt[i-2] + '\n'
if char[0] in ['-', 'C', 'L', 'o']:
alt[i] = ' ' + alt[i]
alt = ''.join(alt)
print(alt)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.