如何从多个 url 中抓取数据并将这些数据保存在同一个 csv 文件中？

Question

I am using beautifulsoup to scrape the data.我正在使用beautifulsoup来抓取数据。 There are multiple urls and I have to save the data I scrape from these urls in the same CSV file.有多个 url，我必须将从这些 url 中抓取的数据保存在同一个 CSV 文件中。 When I try to scrape from separate files and save to the same CSV file, the data in the last url I scraped in the CSV file is there.当我尝试从单独的文件中抓取并保存到同一个 CSV 文件时，我在 CSV 文件中抓取的最后一个 url 中的数据就在那里。 Below is the piece of code that I scraped the data from.下面是我从中抓取数据的一段代码。

images = []
pages = np.arange(1, 2, 1)
for page in pages:
    url = "https://www.bkmkitap.com/sanat"
    results = requests.get(url, headers=headers)
    soup = BeautifulSoup(results.content, "html.parser")
    book_div = soup.find_all("div", class_="col col-12 drop-down hover lightBg")
    sleep(randint(2, 10))
    for bookSection in book_div:
        img_url = bookSection.find("img", class_="lazy stImage").get('data-src')
        images.append(img_url)  
books = pd.DataFrame(
    {
        "Image": images,
} )
books.to_csv("bkm_art.csv", index=False, header=True,encoding = 'utf-8-sig')

Answer 1

Your question isn't very clear.你的问题不是很清楚。 When you run this, I assume a csv gets created with all the image urls, and you want to rerun this same script and have other image URL's get appended to the same csv?当你运行这个时，我假设一个 csv 是用所有的图像 URL 创建的，你想重新运行这个相同的脚本并将其他图像 URL 附加到同一个 csv？ If that is the case, then you only need to change the to_csv function call to:如果是这种情况，那么您只需将to_csv function 调用更改为：

books.to_csv("bkm_art.csv", mode='a', index=False, header=False ,encoding = 'utf-8-sig')

Adding mode='a' starts appending to the file instead of overwriting it ( doc ).添加mode='a'开始附加到文件而不是覆盖它（ doc ）。

Answer 2

Main issue in your example is that you do not get the second page, so you wont get these results - Iterate all of them and then create your CSV.您的示例中的主要问题是您没有获得第二页，因此您不会获得这些结果 - 迭代所有这些结果，然后创建您的 CSV。

Second one, as you want to append data to an existing file, is figured out by @MB第二个，如你想将 append 数据到现有文件中，由@MB 计算出来

Note: Try to avoid selecting your elements by classes, cause they arr more dynamic then id or HTML structure注意：尽量避免按类选择元素，因为它们比id或 HTML 结构更动态

Example例子

import requests, random
from bs4 import BeautifulSoup

data = []

for page in range(1, 3, 1):
    url = f"https://www.bkmkitap.com/sanat?pg={page}"
    results = requests.get(url, headers=headers)
    soup = BeautifulSoup(results.content, "html.parser")
    
    for bookSection in soup.select('[id*="product-detail"]'):
        data.append({
            'image':bookSection.find("img", class_="lazy stImage").get('data-src')
        })
books = pd.DataFrame(data)

books.to_csv("bkm_art.csv", index=False, header=True,encoding = 'utf-8-sig')

Output Output

    image
0   https://cdn.bkmkitap.com/sanat-dunyamiz-190-ey...
1   https://cdn.bkmkitap.com/sanat-dunyamiz-189-te...
2   https://cdn.bkmkitap.com/tiyatro-gazetesi-sayi...
3   https://cdn.bkmkitap.com/mavi-gok-kultur-sanat...
4   https://cdn.bkmkitap.com/sanat-dunyamiz-iki-ay...
... ...
112 https://cdn.bkmkitap.com/hayal-perdesi-iki-ayl...
113 https://cdn.bkmkitap.com/cins-aylik-kultur-der...
114 https://cdn.bkmkitap.com/masa-dergisi-sayi-48-...
115 https://cdn.bkmkitap.com/istanbul-sanat-dergis...
116 https://cdn.bkmkitap.com/masa-dergisi-sayi-49-...
117 rows × 1 columns

Answer 3

import numpy as np
import pandas as pd
pages = np.arange(1, 2, 1)
for page in pages:
    print(page)

try it, you will find you just get 1试试看，你会发现你只得到1

may be you can use也许你可以使用

pages = range(1, 2, 1)

Answer 4

you can use request module of python to request and scrap the data and after that using pandas you can convert it into csv file.您可以使用 python 的请求模块来请求和报废数据，然后使用 pandas 您可以将其转换为 csv 文件。

https://www.tutorialspoint.com/requests/requests_web_scraping_using_requests.html https://www.tutorialspoint.com/requests/requests_web_scraping_using_requests.html

pandas.to_csv() can be used可以使用 pandas.to_csv()

如何从多个 url 中抓取数据并将这些数据保存在同一个 csv 文件中？

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-09-05 06:36:13

解决方案2
0 2022-09-05 06:36:52

Example例子

Output Output

解决方案3
0 2022-09-05 06:41:58

解决方案4
0 2022-09-05 07:10:07

如何从多个 url 中抓取数据并将这些数据保存在同一个 csv 文件中？

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-09-05 06:36:13

解决方案2 0 2022-09-05 06:36:52

Example例子

Output Output

解决方案3 0 2022-09-05 06:41:58

解决方案4 0 2022-09-05 07:10:07

解决方案1
1 已采纳 2022-09-05 06:36:13

解决方案2
0 2022-09-05 06:36:52

解决方案3
0 2022-09-05 06:41:58

解决方案4
0 2022-09-05 07:10:07