I was scraping a website but I am getting escape sequence characters with the output. The characters are the following:
\r \n \t \xa
I tried.split() method but the issue with this method is when scrapy crawler doesn't find a single value, it doesn't scrape any value and move to the next iteration.
What's the best way to bypass these characters?
Following is the output:
Python's re.sub
can achieve this.
>>> import re
>>> re.sub(r'\s+', ' ', "\t \xa0")
' '
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon")
' py thon'
>>> # You can then use str.strip to get rid of any surrounding spaces
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon").strip()
'py thon'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.