Python CSV 编写器仅写入处理的最后一个已抓取项目

Question

So my scraper is only sending the last two items to csv from last page it processed.I can not figure out where i am doing wrong it prints output perfectly fine.May be experience set of eyes able to help.所以我的刮刀只从它处理的最后一页将最后两个项目发送到 csv。我不知道我在哪里做错了它打印 output 非常好。可能是经验丰富的眼睛能够提供帮助。

Code Below:下面的代码：

from requests_html import HTMLSession
import csv
import time


 def get_links(url):
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
         links.append(item.find('a', first=True).attrs['href'])

   # print(len(links))

    return links


 def get_product(link):
     _request = _session.get(link)

      title = _request.html.find('h2', first=True).full_text
      price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
      sku = _request.html.find('span.sku', first=True).full_text
      categories = _request.html.find('span.posted_in', first=True).full_text.replace('Categories:', "").strip()
      brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:', "").strip()
      #print(brand)

       product = {
         'Title': title,
         'Price': price,
         'SKU': sku,
         'Categories': categories,
         'Brand': brand
       }

    #print(product)
     return product


if __name__ == '__main__':
    for page in range(1, 4):

        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
    
        if page == 1:
           parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'

       _session = HTMLSession()

        links = get_links(parse_url)
        results = []

        for link in links:
            results.append(get_product(link))
            time.sleep(1)
            #print(len(results))


with open('on_sale_bass.csv', 'w', newline='', encoding='utf-8') as csv_file:
    
    writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
    writer.writeheader()

    for row in results:
        writer.writerow(row)

when i try to append records are written in csv but headers are repeating for each page iteration.当我尝试 append 记录写入 csv 但标题在每个页面迭代中重复。

Answer 1

The problem was in the statement results = [] , inside the range loop.问题出在范围循环内的语句results = []中。 You emptied the results on each iteration of the range(1, 4) loop.您在range(1, 4)循环的每次迭代中清空了results 。 Thus, you were getting only what the last iteration brought in.因此，你得到的只是上一次迭代带来的东西。

Note, I made the _session as global , but in this case it would be reasonable, in my opinion ( feel free to correct ), to just pass it between functions.请注意，我将_session为global ，但在这种情况下，我认为（请随时更正）在函数之间传递它是合理的。 Now, try this out.现在，试试这个。

from requests_html import HTMLSession
import csv
import time


def get_links(url):
    global _session
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
        links.append(item.find('a', first=True).attrs['href'])
    return links


def get_product(link):
    global _session
    _request = _session.get(link)
    title = _request.html.find('h2', first=True).full_text
    price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
    sku = _request.html.find('span.sku', first=True).full_text
    categories = _request.html.find('span.posted_in', first=True).full_text.replace('Categories:', "").strip()
    brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:', "").strip()
    product = {
        'Title': title,
        'Price': price,
        'SKU': sku,
        'Categories': categories,
        'Brand': brand
    }
    return product


if __name__ == '__main__':
    results = []
    for page in range(1, 4):
        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
        if page == 1:
            parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'
    
        _session = HTMLSession()
        links = get_links(parse_url)

        for link in links:
            product = get_product(link)
            results.append(product)
            #time.sleep(1)
            
    with open('on_sale_bass.csv', 'w', newline='', encoding='utf-8') as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=results[0].keys())
        writer.writeheader()
        for row in results:
            writer.writerow(row)

What I get as an example:我得到的一个例子：

Python CSV 编写器仅写入处理的最后一个已抓取项目

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-06-02 09:50:00

Python CSV 编写器仅写入处理的最后一个已抓取项目

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-06-02 09:50:00

解决方案1
0 已采纳 2021-06-02 09:50:00