Skipping Escape Sequence Characters Python Scrapy

Question

I was scraping a website but I am getting escape sequence characters with the output. The characters are the following:

\r \n \t \xa

I tried.split() method but the issue with this method is when scrapy crawler doesn't find a single value, it doesn't scrape any value and move to the next iteration.

What's the best way to bypass these characters?

Following is the output:

Answer 1

Python's re.sub can achieve this.

>>> import re
>>> re.sub(r'\s+', ' ', "\t \xa0")
' '
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon")
' py thon'
>>> # You can then use str.strip to get rid of any surrounding spaces
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon").strip()
'py thon'

Skipping Escape Sequence Characters Python Scrapy

Question

1 answers

solution1
1 2020-08-08 10:35:49

Skipping Escape Sequence Characters Python Scrapy

Question

1 answers

solution1 1 2020-08-08 10:35:49

solution1
1 2020-08-08 10:35:49