How to extract internal links with numbers in scrapy python

Question

I am new to Scrapy, I am trying to extract internal links which have 3-4 digit number in them.

Here's an example of one of the internal links.

https://www.example.com/detail-info/150-exampleurl

Here's my code.

for links in response.css('section.content-current'):
        internal_link = links.xpath('(*//a/@href)').re(r"\d+")

I am able to get all the internal links on the page using this code, without .re . Please help me write the regex to extract only the URL which has 2 or 3 consecutive numbers in them.

Answer 1

You can match the whole string that contains at least three digits:

.re(r"(?s).*\d{3}.*")

Details

(?s) - a re.S / re.DOTALL inline modifier that makes . match across lines
.* - any zero or more chars as many as possible
\d{3} - any three digits
.* - any zero or more chars as many as possible.

How to extract internal links with numbers in scrapy python

Question

1 answers

solution1
0 2021-05-24 18:09:50

How to extract internal links with numbers in scrapy python

Question

1 answers

solution1 0 2021-05-24 18:09:50

solution1
0 2021-05-24 18:09:50