简体   繁体   中英

Skipping Escape Sequence Characters Python Scrapy

I was scraping a website but I am getting escape sequence characters with the output. The characters are the following:

\r \n \t \xa

I tried.split() method but the issue with this method is when scrapy crawler doesn't find a single value, it doesn't scrape any value and move to the next iteration.

What's the best way to bypass these characters?

Following is the output:

在此处输入图像描述

Python's re.sub can achieve this.

>>> import re
>>> re.sub(r'\s+', ' ', "\t \xa0")
' '
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon")
' py thon'
>>> # You can then use str.strip to get rid of any surrounding spaces
>>> re.sub(r'\s+', ' ', "\t \xa0 py \t \t \xa0 thon").strip()
'py thon'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM