在 for 循环中创建新的变量/类实例？ Python web刮

Question

I am currently working on a web scraper that will take urls as inputs, find the page, scrape it, then return results in a CSV.我目前正在研究 web 刮板，它将 url 作为输入，找到页面，刮掉它，然后在 CSV 中返回结果。 The scraper works well for single URL's at a time.刮板一次适用于单个 URL。 But unfortunately whenever it writes a new line to the scrape results CSV it also appends the previous url's scrape results in each column.但不幸的是，每当它向抓取结果 CSV 写入新行时，它也会在每一列中附加上一个 url 的抓取结果。 I need a loop that will essentially create new class variables inside the loop so that this doesn't happen.我需要一个循环，它基本上会在循环内创建新的 class 变量，这样就不会发生这种情况。 Something like that does this: Takes list of urls, then also creates unique class instance.类似的事情是这样的：获取 url 列表，然后还创建唯一的 class 实例。

links = ['www.SomeLink1.com','www.Somelink2.com','www.SomeLink3.com']


person1 = Person('www.SomeLink1.com', driver = driver, close_on_complete = False)
person2 = Person('www.Somelink2.com', driver = driver, close_on_complete = False)
person3 = Person('www.SomeLink3.com', driver = driver, close_on_complete = False)

I do not have access to the source code to create a new method "person1.reset()" or something.我无权访问源代码来创建新方法“person1.reset()”或其他东西。

Here is also the original code I was using to scrape multiple pages:这也是我用来抓取多个页面的原始代码：

# Import libraries
from linkedin_scraper import Person, actions
from selenium import webdriver
import csv
import os
import pandas as pd
import numpy as np
import smtplib

# Read-in list of contacts:
contacts = pd.read_csv("/Users/Desktop/ContactList.csv")
names = contacts['contact_name'].tolist()
urls = contacts['contact_url'].tolist()
# turn contacts list into dictionary just in case
contact_dict = {names[i]: urls[i] for i in range(len(names))}
print(contact_dict)

# automatically login to LinkedIn
driver = webdriver.Chrome('/Users/Downloads/chromedriver')
email = os.environ.get('LINKEDIN_USER')
password = os.environ.get('LINKEDIN_PASS')
actions.login(driver, email, password)

# create general field names
fields = ['name', 'about', 'job_title', 'location','company',
          'education','accomplishments','linkedin_url']

with open('ScrapeResults.csv', 'w') as f:
    # using csv.writer method from CSV package
    write = csv.writer(f)
    write.writerow(fields)
f.close()

# Loop-through urls to scrape multiple pages at once
for individual,link in contact_dict.items():

    ## assign ##
    the_name = individual
    the_link = link
    # scrape peoples url:
    person = Person(the_link, driver=driver, close_on_complete=False)

    # rows to be written... only index for lists?
    rows = [[person.name, person.about, person.job_title, person.location, person.company,
             person.educations, person.accomplishments, person.linkedin_url]]
    # write
    with open('ScrapeResults.csv', 'a') as f:
    # using csv.writer method from CSV package
        write = csv.writer(f)
        write.writerows(rows)
        f.close()

Answer 1

Could you try instantiating a new driver each time?您可以尝试每次都实例化一个新driver吗？ That should reset counters in driver for you.那应该为您重置driver中的计数器。

for individual,link in contact_dict.items():
    the_name = individual
    the_link = link
    driver = Driver() # I don't know how to instantiate this
    person = Person(the_link, driver=driver, close_on_complete=False).

Without access to driver documentation, I cannot speak to how to properly instantiate it.如果无法访问驱动程序文档，我无法谈论如何正确实例化它。 As well, it might even have a helper to clear() or reset() internal variables which would be preferable to recreating the driver from scratch.同样，它甚至可能有一个帮助器来clear()或reset()内部变量，这比从头开始重新创建driver更好。 In any case, the scraper should have straightforward documentation for this.在任何情况下，刮板都应该为此提供简单的文档。

Answer 2

Got in touch with creator of "linkedin_scraper" library.与“linkedin_scraper”库的创建者取得联系。 He fixed a bug that cached previous linkedin profile values/accumulated them when scraping multiple at once.他修复了一个错误，该错误会在一次抓取多个时缓存以前的linkedin配置文件值/累积它们。

Issue resolved in version 2.7.5.问题已在 2.7.5 版中解决。

Please see: https://github.com/joeyism/linkedin_scraper/issues/84请参阅： https://github.com/joeyism/linkedin_scraper/issues/84

Thanks all!谢谢大家！

在 for 循环中创建新的变量/类实例？ Python web刮

问题描述

2 个解决方案

解决方案1
2 2021-03-08 06:08:13

解决方案2
1 已采纳 2021-03-24 03:14:22

在 for 循环中创建新的变量/类实例？ Python web刮

问题描述

2 个解决方案

解决方案1 2 2021-03-08 06:08:13

解决方案2 1 已采纳 2021-03-24 03:14:22

解决方案1
2 2021-03-08 06:08:13

解决方案2
1 已采纳 2021-03-24 03:14:22