简体   繁体   中英

How do i replace each new line with a whitespace and replace 2 strings with a white space in python?

This is the answer that scrapes a particular section of an article on a website.

soup.find("div", {"id": "content_wrapper"}).text

I am supposed to replace each new line ('\n') in the body text with a whitespace (' '). I have done this with -soup.find("div", {"id": "content_wrapper"}).text.replace("\n", " ").strip()

But I still need to replace each of the '\xa0' and ' ' strings in the body text with a whitespace (' ') and Strip out all leading and trailing whitespaces.

How do I do this please?

Thank you!

You just can add new replace methods after a replace method.

text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', ' ').replace('\xa0', ' ').replace('\u200a', ' ').strip()

If I understood correctly you want to remove these whitespaces too. Then, you shouldn't replace the words with whitespace " ". You should replace them with empty string, "".

text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', '').replace('\xa0', '').replace('\u200a', '').strip()

all you need to do is check to see if it is in the text and write over it. like:

string = soup.find('div', {'id': 'content_wrapper'}).text
write = []
for i in string:
    if i.find('\\xa0') == 0: i = ''
    if i.find('\\u200a') == 0: i = ''
    write.append(i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM