简体   繁体   中英

Problem crawling a link that contains: '#'

I'm trying to use scrapy to crawl pages of a 'category' in a website. so I go on my way and get the number of pages. and when i try to use response.follow(link, callback) it only works one time and the response.link inside the callback does not contain the page number my code:

for category_page in self.category_pages:
    link = category_page['catLink']
    if(link[-1]=="/"):
        link= link[:-1]
    else:
        pass
    total_pages = category_page['numPages']
    i = 1
    while i <= total_pages:
        next_url = link + f"/#{str(i)}/"
        print(next_url)
        yield response.follow(next_url, callback=self.parse_catPage)
        i += 1

tried ignoring the robots.txt but no success, it "works" when i remove the # from the link tho

"#" is probably just an anchor like my predecessor already said. Use networking tools to find loadresult and fetch request forms to ur script. My bet is its ajax calls but cant say more without targeted url.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM