I have a free text field column in python dataframe with html tags.
ID Free text field
1 <p><span style="background-color: rgb(255, 255, 255); color: rgb(37, 36, 35); font-family:
Arial; font-size: 10.5pt;">TExt1:</span></p><p><span style="background-color: rgb(255, 255,
255); color: rgb(37, 36, 35); font-family: Arial; font-size: 10.5pt;">Score: 5</span></p><p>
<span style="background-color: rgb(255, 255, 255); color: rgb(37, 36, 35); font-family: Arial;
font-size: 10.5pt;">B - </span><span style="background-color: rgb(255, 255, 255); color:
rgb(36, 36, 36); font-family: Arial; font-size: 10.5pt;">TExt2</span></p><p><span
style="background-color: rgb(255, 255, 255); color: rgb(37, 36, 35); font-family: Arial;
font-size: 10.5pt;">Text6</span></p><p><span style="background-color: rgb(255, 255, 255);
color: rgb(37, 36, 35); font-family: Arial; font-size: 10.5pt;">Text3</span></p><p><span
style="background-color: rgb(255, 255, 255); color: rgb(37, 36, 35); font-family: Arial;
font-size: 10.5pt;">Text4</span></p>
2 <p>Text10</p>
3 <p>Sky is blue</p>
4 <p>Text3</p><p><br></p><p>Text19</p>
5 <p> Complaint1</p><p><br></p><p>Text1</p><p>hospo 2</p><p>Tes45</p><p><br></p><p>test</p>
6 <p>Test44</p>
7 <p>Test54</p>
Is there anyway I could remove those html tags?
Any help would be appreciated.
Thanks
try using Beautiful Soup
from bs4 import BeautifulSoup
df['free text'].apply(
lambda x: list(BeautifulSoup(x, "html.parser").stripped_strings)
)
0 [Text10]
1 [Sky is blue]
2 [Text3, Text19]
3 [Complaint1, Text1, hospo 2, Tes45, test]
4 [Test44]
5 [Test54]
Name: free text, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.