简体   繁体   中英

How to extract internal links with numbers in scrapy python

I am new to Scrapy, I am trying to extract internal links which have 3-4 digit number in them.

Here's an example of one of the internal links.

https://www.example.com/detail-info/150-exampleurl

Here's my code.

for links in response.css('section.content-current'):
        internal_link = links.xpath('(*//a/@href)').re(r"\d+")

I am able to get all the internal links on the page using this code, without .re . Please help me write the regex to extract only the URL which has 2 or 3 consecutive numbers in them.

You can match the whole string that contains at least three digits:

.re(r"(?s).*\d{3}.*")

Details

  • (?s) - a re.S / re.DOTALL inline modifier that makes . match across lines
  • .* - any zero or more chars as many as possible
  • \d{3} - any three digits
  • .* - any zero or more chars as many as possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM