简体   繁体   English

Python CSV 编写器仅写入处理的最后一个已抓取项目

[英]Python CSV Writer only writing last scraped item processed

So my scraper is only sending the last two items to csv from last page it processed.I can not figure out where i am doing wrong it prints output perfectly fine.May be experience set of eyes able to help.所以我的刮刀只从它处理的最后一页将最后两个项目发送到 csv。我不知道我在哪里做错了它打印 output 非常好。可能是经验丰富的眼睛能够提供帮助。

Code Below:下面的代码:

from requests_html import HTMLSession
import csv
import time


 def get_links(url):
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
         links.append(item.find('a', first=True).attrs['href'])

   # print(len(links))

    return links


 def get_product(link):
     _request = _session.get(link)

      title = _request.html.find('h2', first=True).full_text
      price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
      sku = _request.html.find('span.sku', first=True).full_text
      categories = _request.html.find('span.posted_in', first=True).full_text.replace('Categories:', "").strip()
      brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:', "").strip()
      #print(brand)

       product = {
         'Title': title,
         'Price': price,
         'SKU': sku,
         'Categories': categories,
         'Brand': brand
       }

    #print(product)
     return product


if __name__ == '__main__':
    for page in range(1, 4):

        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
    
        if page == 1:
           parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'

       _session = HTMLSession()

        links = get_links(parse_url)
        results = []

        for link in links:
            results.append(get_product(link))
            time.sleep(1)
            #print(len(results))


with open('on_sale_bass.csv', 'w', newline='', encoding='utf-8') as csv_file:
    
    writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
    writer.writeheader()

    for row in results:
        writer.writerow(row)

when i try to append records are written in csv but headers are repeating for each page iteration.当我尝试 append 记录写入 csv 但标题在每个页面迭代中重复。

The problem was in the statement results = [] , inside the range loop.问题出在范围循环内的语句results = []中。 You emptied the results on each iteration of the range(1, 4) loop.您在range(1, 4)循环的每次迭代中清空了results Thus, you were getting only what the last iteration brought in.因此,你得到的只是上一次迭代带来的东西。

Note, I made the _session as global , but in this case it would be reasonable, in my opinion ( feel free to correct ), to just pass it between functions.请注意,我将_sessionglobal ,但在这种情况下,我认为(请随时更正)在函数之间传递它是合理的。 Now, try this out.现在,试试这个。

from requests_html import HTMLSession
import csv
import time


def get_links(url):
    global _session
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
        links.append(item.find('a', first=True).attrs['href'])
    return links


def get_product(link):
    global _session
    _request = _session.get(link)
    title = _request.html.find('h2', first=True).full_text
    price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
    sku = _request.html.find('span.sku', first=True).full_text
    categories = _request.html.find('span.posted_in', first=True).full_text.replace('Categories:', "").strip()
    brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:', "").strip()
    product = {
        'Title': title,
        'Price': price,
        'SKU': sku,
        'Categories': categories,
        'Brand': brand
    }
    return product


if __name__ == '__main__':
    results = []
    for page in range(1, 4):
        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
        if page == 1:
            parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'
    
        _session = HTMLSession()
        links = get_links(parse_url)

        for link in links:
            product = get_product(link)
            results.append(product)
            #time.sleep(1)
            
    with open('on_sale_bass.csv', 'w', newline='', encoding='utf-8') as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
        writer.writeheader()
        for row in results:
            writer.writerow(row)

What I get as an example:我得到的一个例子:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM