Problem crawling a link that contains: '#'

Question

I'm trying to use scrapy to crawl pages of a 'category' in a website. so I go on my way and get the number of pages. and when i try to use response.follow(link, callback) it only works one time and the response.link inside the callback does not contain the page number my code:

for category_page in self.category_pages:
    link = category_page['catLink']
    if(link[-1]=="/"):
        link= link[:-1]
    else:
        pass
    total_pages = category_page['numPages']
    i = 1
    while i <= total_pages:
        next_url = link + f"/#{str(i)}/"
        print(next_url)
        yield response.follow(next_url, callback=self.parse_catPage)
        i += 1

tried ignoring the robots.txt but no success, it "works" when i remove the # from the link tho

Answer 1

"#" is probably just an anchor like my predecessor already said. Use networking tools to find loadresult and fetch request forms to ur script. My bet is its ajax calls but cant say more without targeted url.

Problem crawling a link that contains: '#'

Question

1 answers

solution1
1 2020-11-24 12:28:16

Problem crawling a link that contains: '#'

Question

1 answers

solution1 1 2020-11-24 12:28:16

solution1
1 2020-11-24 12:28:16