[英]Click on all 'see more' buttons and scrape all data from a LinkedIN profile using selenium and beautifulsoup
[英]Selenium - How to to click all the More buttons of each individual items to scrape the data from the dropdown
我正在尝试抓取页面上的信息,但是当我打开exported.CSV 文件时,除了标题之外它是空白的。
我正在尝试在此页面上抓取 10 个结果: https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city= &state=&x半径=
我可以抓取名称、公司、城市和 state,但是当单击“更多”下拉菜单时,它似乎不起作用。 (没有出现任何错误,csv 只是空白。)
我怀疑问题出在这个代码块上:
driver.find_element_by_xpath('//div[@class="col-md-4 col-lg-1 arrow"]').click()
这是我的所有代码:
options = Options()
options.headless = True
driver = webdriver.Chrome(executable_path='/Users/vilje/anaconda3/envs/webscrape/chromedriver', options=options)
driver.set_window_size(1440, 900)
# Creates master dataframe
df = pd.DataFrame(columns=['Name','Company', 'City', 'State', 'Phone', 'About'])
# URL
driver.get('https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city=&state=&xRadius=')
name = driver.find_elements_by_xpath('//span[@class="name"]')
company = driver.find_elements_by_xpath('//div[@class="col-md-6 col-lg-4"]')
city = driver.find_elements_by_xpath('//div[@class="col-md-4 col-lg-2"]')
state = driver.find_elements_by_xpath('//div[@class="col-md-4 col-lg-2"]')
# Expand the 'More' button
driver.find_element_by_xpath('//div[@class="col-md-4 col-lg-1 arrow"]').click()
phone = driver.find_elements_by_xpath('//div[@class="col-sm-6 col-lg-3 with-icon lighter-text"]')
about = driver.find_elements_by_xpath('//div[@class="col-sm-12"]')
name_list = []
for n in range(len(name)):
name_list.append(name[n].text)
company_list = []
for c in range(len(company)):
company_list.append(company[c].text)
city_list = []
for c in range(len(city)):
city_list.append(city[c].text)
state_list = []
for s in range(len(state)):
state_list.append(state[s].text)
phone_list = []
for p in range(len(phone)):
phone_list.append(phone[p].text)
about_list = []
for a in range(len(about)):
about_list.append(about[a].text)
# List of each property managers name, company, city, state, phone and about section paired together
data_tuples = list(zip(name_list[0:], company_list[0:], city_list[0:], state_list[0:], phone_list[0:], about_list[0:]))
# Creates dataframe of each tuple in list
temp_df = pd.DataFrame(data_tuples, columns=['Name','Company', 'City', 'State', 'Phone', 'About'])
# Appends to master dataframe
df = df.append(temp_df)
driver.close()
谁能帮我点击每个人的所有“更多”按钮,这样我就可以从下拉列表中抓取数据。
要单击所有文本为More的元素,您需要为element_to_be_clickable()
引入WebDriverWait ,并且可以使用以下任一定位器策略:
使用CSS_SELECTOR
:
driver.get("https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city=&state=&xRadius=") for more in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.row div.arrow"))): more.click()
使用XPATH
:
driver.get("https://www.narpm.org/find/property-managers/?submitted=true&toresults=1&resultsperpage=10&a=managers&orderby=&fname=&lname=&company=&chapter=S005&city=&state=&xRadius=") for more in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='row']//div[contains(@class, 'arrow') and contains(., 'More')]"))): more.click()
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
浏览器快照:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.