简体   繁体   中英

How to replace all empty spaces and new line from text extracted from json using beautiful soup ?

In a container of div with a specific class, I have some text with different id's dd , dl and dt having spaces and lines and some special character like \\, ? etc. How to get rid of it ?

container = soup.find_all(name="div", attrs={"class":"4_square"})

size of container is 1. Any suggestions?

You may try to find all dd and dt and then replace all special characters and empty spaces by replacing it to the default value. I have mentioned below code that you may try.

subject = container[0]
for i in range (0,len(subject.dl.findAll('dd'))):
    temp = subject.dl.find_all('dt')[i].text.strip('\n').replace('\n','').replace(' ','').replace('\?','')
    temp1 = subject.dl.find_all('dd')[i].text.strip('\n').replace('\n','').replace(' ','').replace('\?','')

temp and temp1 will give you the text. I hope this works for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM