简体   繁体   中英

Python Scrapy not executing scrapy.Request callback function for every link

I am trying to make a ebay spider that goes through each product link on a page and for each link visit each link and do something with that new page in parse_link function.

i am scraping this link

in parse function it iterates over each link fine prints out each link fine but only calls the parse function for only one link on a page

i mean each page has 50 or so products i am getting each product link and for each link visit each link and do something in the pase_link function

but for each page the parse_link function gets called for only one link (out of 50 or so links)

here is the code

class EbayspiderSpider(scrapy.Spider):
    name = "ebayspider"
    #allowed_domains = ["ebay.com"]
    start_urls = ['http://www.ebay.com/sch/hfinney/m.html?item=132127244893&rt=nc&_trksid=p2047675.l2562']

    def parse(self, response):
        global c

        for attr in response.xpath('//*[@id="ListViewInner"]/li'):
            item = EbayItem()
            linkse = '.vip ::attr(href)'
            link = attr.css('a.vip ::attr(href)').extract_first()
            c+=1
            print '', 'I AM HERE', link, '\t', c
            yield scrapy.Request(link, callback=self.parse_link, meta={'item': item})
        next_page = '.gspr.next ::attr(href)'
        next_page = response.css(next_page).extract_first()
        print '\nI AM NEXT PAGE\n'
        if next_page:
            yield scrapy.Request(urljoin(response.url, next_page), callback=self.parse)

    def parse_link(self, response):
        global c2
        c2+=1
        print '\n\n\tIam in parselink\t', c2

SEE FOR EVERY 50 or so links scrapy only executes the parse link 1 time i am printing the counts how many links extracted and how many times parse_link gets executed using global variables

shady@shadyD:~/Desktop/ebay$ scrapy crawl ebayspider
ENTER THE URL TO SCRAPE : http://www.ebay.com/sch/hfinney/m.html?item=132127244893&rt=nc&_trksid=p2047675.l2562
2017-05-13 22:44:31 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: ebay)
2017-05-13 22:44:31 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'ebay.spiders', 'SPIDER_MODULES': ['ebay.spiders'], 'BOT_NAME': 'ebay'}
2017-05-13 22:44:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-05-13 22:44:33 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:38079/session {"requiredCapabilities": {}, "desiredCapabilities": {"platform": "ANY", "browserName": "chrome", "version": "", "chromeOptions": {"args": [], "extensions": []}, "javascriptEnabled": true}}
2017-05-13 22:44:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2017-05-13 22:44:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-13 22:44:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-05-13 22:44:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-05-13 22:44:33 [scrapy.core.engine] INFO: Spider opened
2017-05-13 22:44:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-13 22:44:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-13 22:44:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.ebay.com/sch/hfinney/m.html?item=132127244893&rt=nc&_trksid=p2047675.l2562> (referer: None)
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-320B-320BL-320BLN-321B-322BL-325BL-151-9385-/361916086833?hash=item5443e13a31:g:NMwAAOSwX~dWomWJ   1
 I AM HERE http://www.ebay.com/itm/257954A1-New-Case-580SL-580SM-580SL-Series-2-Backhoe-Loader-Hydraulic-Pump-/361345120303?hash=item5421d8f82f:g:KQEAAOSwBLlVVP0X  2
 I AM HERE http://www.ebay.com/itm/Case-580K-forward-reverse-transmission-shuttle-kit-includ-NEW-PUMP-SEALS-GASKETS-/110777599002?hash=item19cadc041a:g:QBgAAOSwh-1W2GkE    3
 I AM HERE http://www.ebay.com/itm/Case-Loader-Backhoe-580L-Hydraulic-Pump-130258A1-130258A2-15-spline-NEW-/361889539361?hash=item54424c2521:g:nzgAAOSw9GhYiQzz 4
 I AM HERE http://www.ebay.com/itm/Hitachi-EX60-PLAIN-Excavator-Service-Manual-Shop-Repair-Book-KM-099-00-KM09900-/132118077640?hash=item1ec2d9e0c8:g:DLkAAOxyVLNS6Cj7  5
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-416E-420D-420E-428D-Backhoe-3054c-C4-4-engine-TurboCharger-turbo-/361576953143?hash=item542faa7537:g:I78AAOSw3ihXTZwm    6
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-excavator-311B-312-312B-Stepping-Throttle-Motor-1200002-120-0002-/131402610746?hash=item1e9834b83a:g:hBUAAOSwpdpVX4DS    7
 I AM HERE http://www.ebay.com/itm/Fuel-Cap-Case-Backhoe-Skid-Steer-1845c-1845-1840-1835-1835b-1835c-diesel-or-gas-/132102578279?hash=item1ec1ed6067:g:LCYAAOSwGYVXCDJ4     8
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-excavator-312C-312CL-Stepping-Throttle-Motor-247-5207-2475207-/112125482091?hash=item1a1b33146b:g:1wAAAOSw9IpX0HLt   9
 I AM HERE http://www.ebay.com/itm/AT179792-John-Deere-Loader-Backhoe-310E-310G-310K-310J-710D-Hydraulic-Pump-NEW-/111290280036?hash=item19e96ae864:g:hxQAAOSw2GlXEW8g  10
 I AM HERE http://www.ebay.com/itm/L32129-CASE-580C-480C-Brake-master-cylinder-REPAIR-KIT-480B-580B-530-570-480-430-/112228195723?hash=item1a21525d8b:g:lWEAAOSwux5YRucG    11
 I AM HERE http://www.ebay.com/itm/John-Deere-210C-310C-310D-310E-410B-410C-510C-710C-King-pin-Kingpin-kit-T184816-/112266699462?hash=item1a239de2c6:g:~qAAAOSw44BYfmcP     12
 I AM HERE http://www.ebay.com/itm/Case-257948A1-580L-580L-580SL-580M-580SM-590SL-590SM-Series-2-Coupler-17-spline-/131506726034?hash=item1e9e696492:g:ZnkAAOSwPgxVTNAx     13
 I AM HERE http://www.ebay.com/itm/Construction-Equipment-key-set-John-Deere-Hitachi-JD-JCB-excavator-backhoe-multi-/360445978301?hash=item53ec4126bd:g:1HkAAMXQlUNRLOiF    14
 I AM HERE http://www.ebay.com/itm/Case-580C-580E-forward-reverse-transmission-shuttle-kit-includ-NEW-SEALS-GASKETS-/361588374712?hash=item543058bcb8:g:kOYAAOSwDuJW2Gna    15
 I AM HERE http://www.ebay.com/itm/John-Deere-300D-310D-315D-TRANSMISSION-REVERSER-SOLENOID-ASSEMBLY-EARLY-AT163601-/361435304759?hash=item5427391337:g:5rsAAOSwnipWXft4    16
 I AM HERE http://www.ebay.com/itm/Bobcat-743-Service-Manual-Book-Skid-steer-6566109-/131768685855?hash=item1eae06951f:g:rgcAAOSwQgpW~nqW   17
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-320C-312c-330c-325c-1573198-157-3198-panel-/112063225844?hash=item1a177d1ff4:g:BtgAAOSwepZXTfZ~    18
 I AM HERE http://www.ebay.com/itm/Ford-NEW-HOLLAND-Loader-BACKHOE-Hydraulic-pump-550-535-555-D1NN600B-Cessna-/360202190657?hash=item53ddb93f41:g:3gkAAOSwPgxVP5VF  19
 I AM HERE http://www.ebay.com/itm/87435827-New-Case-590SL-590SM-Series-1-2-Backhoe-Loader-Hydraulic-oil-Pump-14S-/131992359553?hash=item1ebb5b9281:g:KQEAAOSwBLlVVP0X  20
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-excavator-311B-312-312B-Stepping-Throttle-Motor-2475227-247-5227-/111677605339?hash=item1a008105db:g:stsAAOSwNSxVX4kG    21
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-938H-950H-962H-416E-Wheel-Loader-Locking-Fuel-Tank-Cap-2849039-/111446084638?hash=item19f2b44c1e:g:u0IAAOxy1klRdqOQ  22
 I AM HERE http://www.ebay.com/itm/FORD-BACKHOE-Hydraulic-pump-555C-555D-655D-E7NN600CA-/361376010222?hash=item5423b04fee:g:UdkAAOSwu4BV4J6T    23
 I AM HERE http://www.ebay.com/itm/John-Deere-Excavator-AT154524-High-Speed-Solenoid-valve-490E-790ELC-790E-pump-/131623918235?hash=item1ea5659a9b:g:o-EAAOSwo0JWF~PC   24
 I AM HERE http://www.ebay.com/itm/John-Deere-350C-450C-Dozer-Loader-Arm-Rest-PAIR-SEAT-/360164308266?hash=item53db77352a:m:m-79tleHP2PC3zD-HqRPMQw     25
 I AM HERE http://www.ebay.com/itm/Caterpillar-Cat-D3-D3B-D3C-D4B-D4C-D4H-D5C-Dozer-3204-Engine-water-pump-NEW-/112061839578?hash=item1a1767f8da:g:6x0AAOSwIgNXjkNm     26
 I AM HERE http://www.ebay.com/itm/International-IH-TD5-OLD-Crawler-Dozer-Seat-cushions-/110840656548?hash=item19ce9e32a4:m:mu5f6-grIZNQVtDoLSDcDJg     27
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-D3C-Series-III-D4G-D4H-8E4148-Arm-rests-rest-cushion-Dozer-seat-/131827423319?hash=item1eb186d857:g:JxMAAOSwQaJXRdzW     28
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-320C-321C-322C-325C-260-2160-2602160-gauge-/112014409886?hash=item1a1494409e:g:BtgAAOSwepZXTfZ~    29
 I AM HERE http://www.ebay.com/itm/John-Deere-JD-NON-Turbo-Muffler-AT83613-210C-300D-310C-310D-315C-315D-400G-410B-/361917008791?hash=item5443ef4b97:g:U0wAAOSw~CRTpFsn     30
 I AM HERE http://www.ebay.com/itm/John-Deere-210C-310D-Shuttle-transmission-Overhaul-Kit-With-Pump-Forward-Reverse-/361916993624?hash=item5443ef1058:g:8cUAAOSwDNdVp7-1    31
 I AM HERE http://www.ebay.com/itm/AT318659-AT139444-John-Deere-Loader-Brake-Hydraulic-Pump-NEW-SURPLUS-544E-544G-/132040240495?hash=item1ebe362d6f:g:mRMAAOSwJ7RYWWUF  32
 I AM HERE http://www.ebay.com/itm/Hitachi-EX60-PLAIN-Excavator-PARTS-Manual-Book-P10717-P107E16-Machine-Comp-/132110375418?hash=item1ec26459fa:g:rbwAAOSwPe1UAQal  33
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-D2-ENGINE-SERVICE-REPAIR-manual-book-D311-212-motor-grader-/360724733057?hash=item53fcde9c81:m:mfYRAKtemeCg_HnjxHAiO0w   34
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-312C-315C-318C-319C-260-2160-2602160-gauge-/131833751423?hash=item1eb1e7677f:g:BtgAAOSwepZXTfZ~    35
 I AM HERE http://www.ebay.com/itm/121335A1-Case-580L-580L-Series-2-Backhoe-Throttle-Cable-BENT-77-75-LONG-BEND-/361891435313?hash=item5442691331:g:lgcAAOSwhOdXogxu    36
 I AM HERE http://www.ebay.com/itm/Heavy-Construction-Equipment-21-Key-Set-Cat-Case-Deere-Komatsu-Volvo-Truck-Laser-/111018804148?hash=item19d93c83b4:m:mm5Eephzc48HDdiNjCCaxtg     37
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-320B-322B-325B-throttle-motor-governor-2475232-247-5232-5-pin-/112183024608?hash=item1a1ea11be0:g:4bUAAOSwXeJYESNh   38
 I AM HERE http://www.ebay.com/itm/John-Deere-REAR-Window-BOTTOM-300D-310D-310E-410D-410E-510D-710D-Backhoe-T132952-/111788475468?hash=item1a071cc44c:m:mM6nkmXre_mrGj9gBQbSQHQ     39
 I AM HERE http://www.ebay.com/itm/JD-John-Deere-200CLC-120CLC-Excavator-Cab-Front-Upper-Glass-Window-4602562-120C-/361479558328?hash=item5429dc54b8:g:WvEAAOSw2s1Uz-er     40
 I AM HERE http://www.ebay.com/itm/Hitachi-Excavator-Front-Lower-Glass-Window-4369588-/110718985349?hash=item19c75da485:m:mettchbVo-QopfqTgIqtY3g   41
 I AM HERE http://www.ebay.com/itm/Caterpillar-D6M-D6N-D6R-D8R-Suspension-Seat-6W9744-Cat-/361294230211?hash=item541ed072c3:g:3wAAAOSwNSxVULZJ  42
 I AM HERE http://www.ebay.com/itm/Komatsu-D20A-3-D20P-7-D21P-7-Dozer-Track-Adjuster-Seal-Kit-909036-WITH-BUSHING-/132165283763?hash=item1ec5aa2fb3:g:-0MAAOSwdzVXl3CN  43
 I AM HERE http://www.ebay.com/itm/Locking-Fuel-Cap-John-Deere-310S-310SE-410E-backhoe-AT176378-NEW-310-S-SE-410-E-/361853261989?hash=item54402298a5:g:NUIAAOSwOtdYUEnj     44
 I AM HERE http://www.ebay.com/itm/John-Deere-450G-455G-550G-555G-650G-Dozer-Loader-Arm-Rest-rests-/361912161141?hash=item5443a55375:g:7rkAAOSw3xJVVhwe     45
 I AM HERE http://www.ebay.com/itm/John-Deere-AT418735-RIGHT-bucket-Handle-CT322-240-250-260-270-Skid-Steer-loader-/112335938162?hash=item1a27be6272:g:A2MAAOSwTM5YyYCc     46
 I AM HERE http://www.ebay.com/itm/Caterpillar-Cat-Tooth-Penetration-Rock-Tip-220-9092-2209092-320C-320D-325C-325D-/361928972291?hash=item5444a5d803:g:nGsAAOxy4YdTV~Qx     47
 I AM HERE http://www.ebay.com/itm/John-Deere-AT418734-LEFT-Bucket-Handle-CT322-240-250-260-270-Skid-Steer-loader-/132127244893?hash=item1ec365c25d:g:5doAAOSwax5YyYAH  48
 I AM HERE http://www.ebay.com/itm/4N9618-CAT-Caterpillar-977L-966C-235-D6C-3306-ENGINE-caterpiller-dozer-loader-/112360381857?hash=item1a29335da1:g:dLsAAOSwuLZY5lPU   49
 I AM HERE http://www.ebay.com/itm/Bobcat-763-763F-Service-Manual-Book-Skid-steer-6900091-repair-shop-book-/131531875901?hash=item1e9fe9263d:g:VUsAAOxyOlhS0EiN 50

I AM NEXT PAGE

2017-05-13 22:44:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.ebay.com/itm/Bobcat-763-763F-Service-Manual-Book-Skid-steer-6900091-repair-shop-book-/131531875901?hash=item1e9fe9263d:g:VUsAAOxyOlhS0EiN> (referer: http://www.ebay.com/sch/hfinney/m.html?item=132127244893&rt=nc&_trksid=p2047675.l2562)


    Iam in parselink    2
2017-05-13 22:44:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.ebay.com/sch/m.html?item=132127244893&_ssn=hfinney&_pgn=2&_skc=50&rt=nc> (referer: http://www.ebay.com/sch/hfinney/m.html?item=132127244893&rt=nc&_trksid=p2047675.l2562)
 I AM HERE http://www.ebay.com/itm/Hitachi-EX120-3-Excavator-Service-Technical-WorkShop-Manual-Shop-KM135E00-/361971788377?hash=item5447332a59:g:uXEAAMXQEgpTERZv   51
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-320B-320BL-320BLN-321B-322BL-325BL-106-0172-/112208711245?hash=item1a20290e4d:g:NMwAAOSwX~dWomWJ   52
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-D4D-Seat-Cushion-Set-Arm-Rest-Dozer-9M6702-8K9100-3K4403-NEW-/111027276253?hash=item19d9bdc9dd:g:taYAAMXQhuVROmSf    53
 I AM HERE http://www.ebay.com/itm/FORD-555E-575E-655E-675E-BACKHOE-GLASS-WINDOW-DOOR-UPPER-RH-LH-85801626-/111004314632?hash=item19d85f6c08:g:kSkAAOxyzHxRL8~e 54
 I AM HERE http://www.ebay.com/itm/187-8391-1878391-Caterpillar-Cat-Oil-Cooler-939C-D4C-D5C-933C-D3C-Series-3-/132036431899?hash=item1ebdfc101b:g:VhQAAOSw3YNXYtcn  55
 I AM HERE http://www.ebay.com/itm/A137187-CASE-BACKHOE-Power-Steering-pump-480B-580B-530-NEW-A36559-/132028859390?hash=item1ebd8883fe:g:HMsAAOSwzOxUWpVL   56
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-953-7N5538-Exhaust-flex-pipe-EARLY-S-N-/361407787737?hash=item54259532d9:g:n3YAAOSwo6lWHQOL  57
 I AM HERE http://www.ebay.com/itm/LINKBELT-Excavator-locking-Fuel-Cap-with-keys-KHH0140-/131504146758?hash=item1e9e420946:g:FHUAAOSwPhdVSLkJ   58
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-D4H-D5H-D6D-EXHAUST-PIPE-LOCKING-RAIN-CAP-5-INCH-/131962111459?hash=item1eb98e05e3:g:0fgAAOSwpLNX9qT1    59
 I AM HERE http://www.ebay.com/itm/Caterpillar-CAT-Dozer-D5C-D5G-rear-sprocket-segments-NEW-1979677-1979678-CR6602-/361403972171?hash=item54255afa4b:g:qJsAAOSwLqFV9tkk     60
 I AM HERE http://www.ebay.com/itm/John-Deere-4265372-RPM-sensor-110-120-160C-200C-330CLC-490E-790ELC-892E-HITACHI-/131567763291?hash=item1ea20cbf5b:g:PZYAAOSwPcVVup-H     61
 I AM HERE http://www.ebay.com/itm/CATERPILLAR-D3B-931B-arm-rests-9C4136-5G2621-/360160327148?hash=item53db3a75ec:m:mY4iFhRua2zcfV6IL5i8csQ     62
 I AM HERE http://www.ebay.com/itm/Bobcat-864-Operation-Maintenance-Manual-Book-6900953-operator-skid-steer-Track-/131664897965?hash=item1ea7d6e7ad:g:exkAAOSwcBhWXem~  63
 I AM HERE http://www.ebay.com/itm/Case-550G-650G-750G-850G-1150G-arm-rests-194738A1-144427A1-seat-cushion-crawler-/112393155898?hash=item1a2b27753a:g:GVEAAOSw5L9XDoN-     64
 I AM HERE http://www.ebay.com/itm/7834-41-3002-7834-41-3003-Komatsu-PC300-7-PC360-7-PC400-7-Throttle-motor-/132135899267?hash=item1ec3e9d083:g:ulMAAOSw4A5Y1Agl    65
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-955H-Crawler-Loader-Dozer-Parts-Manual-Book-NEW-60A8413-and-up-/361855690487?hash=item544047a6f7:g:FeUAAOSwux5YVDfu  66
 I AM HERE http://www.ebay.com/itm/Case-580CK-530-530ck-2wd-Power-Steering-cylinder-A37859-A37509-/111184835276?hash=item19e321f2cc:g:h~QAAOxyGstR8DSu  67
 I AM HERE http://www.ebay.com/itm/Case-Backhoe-580-SUPER-L-580L-590SL-Radiator-234876A1-234876A2-Metal-tank-580SL-/111646548306?hash=item19fea72152:g:3igAAOxyI8lR8TnL     68
 I AM HERE http://www.ebay.com/itm/Dresser-International-TD7C-TD8C-TD7E-TD12-TD15E-Dozer-Fuel-Cap-701922C2-103768C1-/132062834112?hash=item1ebf8eedc0:g:-CEAAOSwImRYeOug    69
 I AM HERE http://www.ebay.com/itm/JD-John-Deere-120-160LC-200LC-230LC-Excavator-Cab-Door-Lower-Glass-4383401-/360651229974?hash=item53f87d0b16:g:fhUAAMXQDfdRqPQ5  70
 I AM HERE http://www.ebay.com/itm/New-Holland-LB75b-loader-backhoe-operators-manual-operator-operation-maintenance-/361287895632?hash=item541e6fca50:g:1WAAAOSwAvJW9X~t    71
 I AM HERE http://www.ebay.com/itm/Bobcat-743-early-parts-Manual-Book-Skid-steer-loader-6566179-/112084996042?hash=item1a18c94fca:g:wAoAAOxykmZTNY92    72
 I AM HERE http://www.ebay.com/itm/Dresser-TD15E-Operator-Maintenance-Manual-International-crawler-dozer-operation-/111385189587?hash=item19ef131cd3:g:qDYAAOSwnQhXohwA     73
 I AM HERE http://www.ebay.com/itm/FORD-555E-575E-655E-675E-BACKHOE-GLASS-WINDOW-REAR-BACK-85801632-/360573341694?hash=item53f3d88ffe:g:nDQAAOxyyF5RL9H2    74
 I AM HERE http://www.ebay.com/itm/DEERE-160LC-200LC-230LC-330LC-370-GLASS-LOWER-AT214097-/361070972976?hash=item541181d030:m:mettchbVo-QopfqTgIqtY3g   75
 I AM HERE http://www.ebay.com/itm/John-Deere-NEW-Turbocharger-turbo-545D-590D-595-495D-EXCAVATOR-JD-RE26342-NEW-/131458659790?hash=item1e9b8bf5ce:g:3c4AAOxyu4dRwzW4   76
 I AM HERE http://www.ebay.com/itm/FORD-555E-575E-655E-675E-BACKHOE-GLASS-WINDOW-DOOR-FRONT-LOWER-LH-85801623-/361342507318?hash=item5421b11936:g:ZbYAAOSwPcVVpsif  77
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-excavator-311B-312-312B-Stepping-Throttle-Motor-247-5231-1190633-/132186922816?hash=item1ec6f45f40:g:hBUAAOSwpdpVX4DS    78
 I AM HERE http://www.ebay.com/itm/Cat-Caterpillar-Excavator-Monitor-330C-260-2160-2602160-gauge-/361578440228?hash=item542fc12624:g:BtgAAOSwepZXTfZ~   79
 I AM HERE http://www.ebay.com/itm/John-Deere-210C-310D-Shuttle-Reverser-Overhaul-Kit-With-Pump-Forward-Reverse-/131963132435?hash=item1eb99d9a13:g:8cUAAOSwDNdVp7-1    80
 I AM HERE http://www.ebay.com/itm/Caterpillar-Cat-Multi-Terrain-Skid-Steer-Loader-Suspension-seat-cushion-kit-/360880511219?hash=item54062798f3:m:m5Tt8bBvIax8MVfT4VqcQgA  81
 I AM HERE http://www.ebay.com/itm/Case-310G-Crawler-Tractor-4pc-Seat-Cushion-set-/361381166532?hash=item5423fefdc4:g:hzAAAOSwSdZWdHZS  82
 I AM HERE http://www.ebay.com/itm/International-IH-500-OLD-Crawler-Dozer-Seat-cushions-/110598250697?hash=item19c02b60c9:g:DQ0AAMXQTT9RwIuh    83
 I AM HERE http://www.ebay.com/itm/Caterpillar-Cat-Excavator-Locking-Fuel-Cap-0963100-key-E110-E120-E70B-E110B-312-/110702080613?hash=item19c65bb265:g:pLwAAOxy2YtRwx2L     84
 I AM HERE http://www.ebay.com/itm/Fuel-Cap-Case-Backhoe-Skid-Steer-1845c-1845-1840-1835-1835b-1835c-diesel-or-gas-/132102578719?hash=item1ec1ed621f:g:~IcAAOSwgZ1Xvyk9     85
 I AM HERE http://www.ebay.com/itm/87433897-New-Case-580SL-580SM-580SL-Series-1-2-Backhoe-Hydraulic-Pump-14-Spline-/112192774351?hash=item1a1f35e0cf:g:KQEAAOSwBLlVVP0X     86
 I AM HERE http://www.ebay.com/itm/Case-580K-580SK-580L-580SL-BACKHOE-Right-Door-Rear-Hinged-Window-Glass-R52882-/111777519523?hash=item1a067597a3:m:mUh405BlfpMRnDzu0J8qEEw    87
 I AM HERE http://www.ebay.com/itm/Case-backhoe-door-spring-580E-580K-580SK-580SL-580SL-SERIES-2-580L-F44881-/111485899971?hash=item19f513d4c3:m:mpgpGQ1o0j_2ewhNIMMA53w    88
 I AM HERE http://www.ebay.com/itm/FORD-555E-575E-655E-675E-BACKHOE-GLASS-WINDOW-DOOR-LOWER-LH-85801625-/111002325387?hash=item19d841118b:g:HrIAAMXQySpRL9SJ    89
 I AM HERE http://www.ebay.com/itm/International-Dresser-TD8E-Dozer-4pc-Seat-Cushion-set-TD8C-IH-/131522416031?hash=item1e9f58cd9f:g:qC0AAOSwqBJXUJIL   90
 I AM HERE http://www.ebay.com/itm/John-Deere-450G-550G-650G-Crawler-Dozer-Operators-Manual-Maintenance-OMT163974-/132190364513?hash=item1ec728e361:g:lUAAAOxygPtS59xJ  91
 I AM HERE http://www.ebay.com/itm/Heavy-Construction-Equipment-key-set-excavator-bull-dozer-broom-forklift-loaders-/110751342295?hash=item19c94b5ed7:m:mm5Eephzc48HDdiNjCCaxtg     92
 I AM HERE http://www.ebay.com/itm/International-IH-Dresser-TD15B-TD15C-Crawler-Loader-Seat-Cushion-set-4-pieces-/111731372191?hash=item1a03b5709f:g:TrAAAOSwDNdVu5He   93
 I AM HERE http://www.ebay.com/itm/Caterpillar-Cat-Skid-Steer-loader-Suspension-COMPLETE-Seat-247-247B-more-/131069185959?hash=item1e84550fa7:g:kQYAAOxy4dNSqIYD    94
 I AM HERE http://www.ebay.com/itm/John-Deere-JD-Loader-Backhoe-710D-310g-310E-310J-310K-Hydraulic-charge-Pump-/131129131733?hash=item1e87e7c2d5:g:zFQAAOxy9eVRJ9cw     95
 I AM HERE http://www.ebay.com/itm/Case-480E-480ELL-LANDSCAPE-Backhoe-4x4-4wd-FRONT-RIM-wheel-New-D126930-12-X-16-5-/360913564299?hash=item54081ff28b:m:mYte9AXdktKLD9H-HOFJthQ     96
 I AM HERE http://www.ebay.com/itm/Bobcat-763F-763-Operation-Maintenance-Manual-operator-owner-6900788-/360337555830?hash=item53e5cac176:g:4IQAAOxy4dNSxZHP     97
 I AM HERE http://www.ebay.com/itm/Bobcat-753H-753-H-Service-Manual-Book-Skid-steer-loader-6900090-/131522633242?hash=item1e9f5c1e1a:g:1JEAAOxyUrZS-j4Q     98
 I AM HERE http://www.ebay.com/itm/John-Deere-JD-550-Crawler-Dozer-Parts-Manual-PC1437-/131985496504?hash=item1ebaf2d9b8:g:GkIAAOSwPgxVLR7f     99
 I AM HERE http://www.ebay.com/itm/Case-IH-580D-580SE-580SD-Backhoe-Rear-Closure-Panel-Cab-Glass-Window-CG3116-NEW-/111070117033?hash=item19dc4b7ca9:g:jHEAAOxykVNRwL34     100

I AM NEXT PAGE

2017-05-13 22:44:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.ebay.com/itm/Case-IH-580D-580SE-580SD-Backhoe-Rear-Closure-Panel-Cab-Glass-Window-CG3116-NEW-/111070117033?hash=item19dc4b7ca9:g:jHEAAOxykVNRwL34> (referer: http://www.ebay.com/sch/m.html?item=132127244893&_ssn=hfinney&_pgn=2&_skc=50&rt=nc)


    Iam in parselink    3
2017-05-13 22:44:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.ebay.com/sch/m.html?item=132127244893&_ssn=hfinney&_pgn=3&_skc=100&rt=nc> (referer: http://www.ebay.com/sch/m.html?item=132127244893&_ssn=hfinney&_pgn=2&_skc=50&rt=nc)
 I AM HERE http://www.ebay.com/itm/John-Deere-Hitachi-Zaxis-110-120-160-200-225-230-Alternator-1812005304-Excavator-/360495635483?hash=item53ef36dc1b:m:mqifohjA-IWXcIg_oWMee1Q     101
 I AM HERE http://www.ebay.com/itm/CAT-Caterpillar-955H-Crawler-Loader-Dozer-Parts-Manual-Book-NEW-60A8413-and-up-/361855690487?hash=item54404

这里

EDIT: settings.py

# -*- coding: utf-8 -*-

# Scrapy settings for ebay project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#     http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
#     http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'ebay'

SPIDER_MODULES = ['ebay.spiders']
NEWSPIDER_MODULE = 'ebay.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'ebay (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See http://scrapy.readthedocs.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'ebay.middlewares.EbaySpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'ebay.middlewares.MyCustomDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See http://scrapy.readthedocs.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
#    'ebay.pipelines.EbayPipeline': 300,
#}

# Enable and configure the AutoThrottle extension (disabled by default)
# See http://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

items.py

import scrapy
from scrapy.item import Item, Field



class EbayItem(scrapy.Item):
    NAME = scrapy.Field()
    MPN = scrapy.Field()
    ITEMID = scrapy.Field()
    PRICE = scrapy.Field()
    FREIGHT_1_for_quan_1 = scrapy.Field()
    FREIGHT_2_for_quan_2 = scrapy.Field()
    DATE = scrapy.Field()
    QUANTITY = scrapy.Field()
    CATAGORY = scrapy.Field()
    SUBCATAGORY = scrapy.Field()
    SUBCHILDCATAGORY = scrapy.Field()

pipelines.py although i have not touched this file

class EbayPipeline(object):
    def process_item(self, item, spider):
        return item

Middleware.py Have not touched this file either

from scrapy import signals


class EbaySpiderMiddleware(object):
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i

    def process_spider_exception(response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Response, dict
        # or Item objects.
        pass

    def process_start_requests(start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

Solution: no fix needed, it seems to be working fine

I quickly ran your code (with only slight modifications like removing the global vars and replacing EbayItem) and it works fine and visit alls URLs you are creating.

Explanation / What's going on here:

I suspect your scraper is scheduling the urls in a way that makes it appear as if it is not visiting all links. But it will do, only later.

I suspect you have set CONCURRENT_REQUESTS = 2. That's why scrapy is scheduling 2 of the 51 URLs for being processed next. Among these 2 URLs there is the next page URL which creates another 51 requests. And these new requests are pushing the old 49 requests further back in the queue ... and so on and so on it will go until there are no more next links.

If you run the scraper long enough you will see that all links will be visited sooner or later. Most probably the 49 "missing" requests that were created first will be visited last.

Also you can remove the creation of the next_page request to see whether all 50 links are visited.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM