I am new to Scrapy, I am trying to extract internal links which have 3-4 digit number in them.
Here's an example of one of the internal links.
https://www.example.com/detail-info/150-exampleurl
Here's my code.
for links in response.css('section.content-current'):
internal_link = links.xpath('(*//a/@href)').re(r"\d+")
I am able to get all the internal links on the page using this code, without .re
. Please help me write the regex
to extract only the URL which has 2 or 3 consecutive numbers in them.
You can match the whole string that contains at least three digits:
.re(r"(?s).*\d{3}.*")
Details
(?s)
- a re.S
/ re.DOTALL
inline modifier that makes .
match across lines.*
- any zero or more chars as many as possible \d{3}
- any three digits .*
- any zero or more chars as many as possible.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.