简体   繁体   English

使用 Selenium 从多个页面抓取链接

[英]Scraping Links Using Selenium From Multiple Pages

I am scraping links from a website directory, there are 13800 records, 690 pages and 20 records per page, but I am getting the first and last page links.我正在从网站目录中抓取链接,有 13800 条记录、690 页和每页 20 条记录,但我得到了第一页和最后一页链接。 I need all profile links with names in csv file.我需要 csv 文件中名称的所有配置文件链接。 Any help would be great for me.任何帮助对我来说都会很棒。

from selenium import webdriver
from selenium.common import exceptions
import pandas as pd

browser = webdriver.Chrome()
browser.get('https://jito.org/members')

name_list =[]
link_list = []

i = 0
while i < 10:
    try:
        results = browser.find_elements_by_xpath("//*[@class='name']")

        for directory in results:
            name = directory.text
            link = directory.find_element_by_tag_name('a')
            person_link = link.get_attribute("href")

            name_list.append(name)
            link_list.append(person_link)


        browser.find_element_by_css_selector("[title^='Next']").click()
        i += 1

    except exceptions.StaleElementReferenceException:
         pass

df = pd.DataFrame(list(zip(name_list, link_list)), columns=['Name', 'Link'])

JITO_data = df.to_csv('JITO_Directory.csv', index=False)

To extract link and name from all web pages you can do without selenium .Use python requests module and beautiful soup and then load data into pandas and import into csv. To extract link and name from all web pages you can do without selenium python requests module and beautiful soup and then load data into pandas and import into csv.

import requests
from bs4 import BeautifulSoup
import pandas as pd
i=0
name_list =[]
link_list = []
while(i<=13780):
    #print("https://jito.org/members?start={}".format(i))
    res=requests.get("https://jito.org/members?start={}".format(i))
    soup=BeautifulSoup(res.text,"html.parser")
    for item in soup.select('.name>a'):
        name_list.append(item.text)
        link_list.append("https://jito.org" + item['href'])
    i=i+20

df=pd.DataFrame({"Name":name_list,"Link":link_list})
df.to_csv('JITO_Directory.csv', index=False)

Please note if you do not have those library then you need to install it first.请注意,如果您没有这些库,则需要先安装它。

Generated csv result as you can see 13789 records生成的 csv 结果可以看到 13789 条记录

在此处输入图像描述


Updated with print statement for troubleshoot.You can see each iteration as well dataframe.更新了打印语句以进行故障排除。您还可以看到每个迭代 dataframe。

import requests
from bs4 import BeautifulSoup
import pandas as pd
i=0
name_list =[]
link_list = []
while(i<=13780):
    print("https://jito.org/members?start={}".format(i))
    res=requests.get("https://jito.org/members?start={}".format(i))
    soup=BeautifulSoup(res.text,"html.parser")
    for item in soup.select('.name>a'):
        name_list.append(item.text)
        link_list.append("https://jito.org" + item['href'])
    i=i+20
    print(name_list)
    print(link_list)

df=pd.DataFrame({"Name":name_list,"Link":link_list})
print(df)
df.to_csv('JITO_Directory.csv', index=False)
print('Done')

Update print results.更新打印结果。

https://jito.org/members?start=0
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar']
https://jito.org/members?start=20
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain']
https://jito.org/members?start=40
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain', 'Avinash  Jain', 'Vijay  Jain', 'Subhash  Sancheti', 'Rajesh Kumar  Golechha', 'Tejaswini Sudarshan Bafna', 'Swapnil Vilas  Shah', 'Sudeep Vijay Chhallani', 'Sanjay Bansilal Chordiya', 'Preeti Manoj Chhajed', 'Prakash Javerchand Oswal', 'Kiran Bachulal Rathod', 'Devendra Mangilal Bhansali', 'Anand Nitinbhai Mehta', 'Surya Prakash Chopra', 'Sanjay  Gemawat', 'Sangita Jain. Jain Lunker', 'Sham Lal Jain', 'Sanjay  Golecha', 'Manoj Kumar Jain', 'Yogesh Brijlalji Chopda']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain', 'https://jito.org/profile/14189-avinash-jain', 'https://jito.org/profile/14188-vijay-jain', 'https://jito.org/profile/14187-subhash-sancheti', 'https://jito.org/profile/14186-rajesh-kumar-golechha', 'https://jito.org/profile/14185-tejaswini-sudarshan-bafna', 'https://jito.org/profile/14184-swapnil-vilas-shah', 'https://jito.org/profile/14183-sudeep-vijay-chhallani', 'https://jito.org/profile/14182-sanjay-bansilal-chordiya', 'https://jito.org/profile/14181-preeti-manoj-chhajed', 'https://jito.org/profile/14180-prakash-javerchand-oswal', 'https://jito.org/profile/14179-kiran-bachulal-rathod', 'https://jito.org/profile/14178-devendra-mangilal-bhansali', 'https://jito.org/profile/14177-anand-nitinbhai-mehta', 'https://jito.org/profile/14176-surya-prakash-chopra', 'https://jito.org/profile/14175-sanjay-gemawat', 'https://jito.org/profile/14174-sangita-jain-jain-lunker', 'https://jito.org/profile/14173-sham-lal-jain', 'https://jito.org/profile/14172-sanjay-golecha', 'https://jito.org/profile/14171-manoj-kumar-jain', 'https://jito.org/profile/14170-yogesh-brijlalji-chopda']
https://jito.org/members?start=60
['NILESH PARASMAL JAIN', 'D K Surana', 'Surender Lal Jain', 'SANDEEP JAIN', 'Nitni Jain', 'KAMLESH CHANDMAL POKHARANA', 'JAYA KAILESH JAIN', 'Ashish Dhariwal', 'Ashok Banthia', 'YASHWANT JAIN', 'Sandeep Mansukhlal Mutha', 'Hamir Bankimbhai Jhaveri', 'Rushab Ajay Bora', 'Nimish Hasmukhbhai Chudgar', 'Kinnar Kantilal Shah', 'Amish Rajendrakumar Shah', 'Abdhishkumar Rajendrakumar Shah', 'Vineet  Gothi', 'Vinay Kumar Chhajer', 'Nirmal Kumar Dugar', 'Nikesh Kumar Jain', 'Ashok Kumar Jain', 'Amit Jain Rathod', 'Amar Kumar Jain', 'Ravi  Kothari', 'Moxesh Prakash Punamiya', 'Sourabh  Kothari', 'Ramesh Kumar Singhvi', 'Ramesh  Daglia', 'Rakesh  Bhanawat', 'Pushpendra  Nalwaya', 'Pritam  Jain', 'Pramod Kumar Mehta', 'Narendra Kumar Jain', 'Mayank  Patwa', 'Dharmendra  Mandot', 'Bhanwar Lal Porwal', 'Ashok Kumar  Porwal', 'Gajendra Kumar Shankar Lal Chandaliya', 'Girish  Jain', 'Avinash  Jain', 'Vijay  Jain', 'Subhash  Sancheti', 'Rajesh Kumar  Golechha', 'Tejaswini Sudarshan Bafna', 'Swapnil Vilas  Shah', 'Sudeep Vijay Chhallani', 'Sanjay Bansilal Chordiya', 'Preeti Manoj Chhajed', 'Prakash Javerchand Oswal', 'Kiran Bachulal Rathod', 'Devendra Mangilal Bhansali', 'Anand Nitinbhai Mehta', 'Surya Prakash Chopra', 'Sanjay  Gemawat', 'Sangita Jain. Jain Lunker', 'Sham Lal Jain', 'Sanjay  Golecha', 'Manoj Kumar Jain', 'Yogesh Brijlalji Chopda', 'Bipin R Shah Rasiklal Shah', 'Kalpesh Arvind Shah', 'Hemant Vishanji Dedhia', 'Manju Parasmal Golecha', 'Urmila Dilip Chandan', 'Ugamraj Misrimal Mehta', 'Surendra Madanmal Mehta', 'Shrenik Champalal Jain', 'Sanjay C Jain', 'Ratan Tarachand Mehta', 'Ramesh Sumermal Nahar', 'Rajesh Kumar Bhagchand Mehta', 'Milapchand Bhimraj Mehta', 'Mahendra Nemichand Bafna', 'Mahendra Kumar Tarachand Mehta', 'Lalit Okhraj Bokadia', 'Lalit Champalal Jain', 'Lakhpatraj Bhagchandji Mehta', 'Kushboo Chirag Chandan', 'Jaswant Bhagchand Mehta']
['https://jito.org/profile/14230-nilesh-parasmal-jain', 'https://jito.org/profile/14228-d-k-surana', 'https://jito.org/profile/14227-surender-lal-jain', 'https://jito.org/profile/14226-sandeep-jain', 'https://jito.org/profile/14225-nitni-jain', 'https://jito.org/profile/14224-kamlesh-chandmal-pokharana', 'https://jito.org/profile/14223-jaya-kailesh-jain', 'https://jito.org/profile/14222-ashish-dhariwal', 'https://jito.org/profile/14221-ashok-banthia', 'https://jito.org/profile/14220-yashwant-jain', 'https://jito.org/profile/14219-sandeep-mutha', 'https://jito.org/profile/14218-hamir-bankimbhai-jhaveri', 'https://jito.org/profile/14217-rushab-ajay-bora', 'https://jito.org/profile/14216-nimish-hasmukhbhai-chudgar', 'https://jito.org/profile/14215-kinnar-kantilal-shah', 'https://jito.org/profile/14214-amish-rajendrakumar-shah', 'https://jito.org/profile/14213-abdhishkumar-rajendrakumar-shah', 'https://jito.org/profile/14212-vineet-gothi', 'https://jito.org/profile/14211-vinay-kumar-chhajer', 'https://jito.org/profile/14210-nirmal-kumar-dugar', 'https://jito.org/profile/14209-nikesh-kumar-jain', 'https://jito.org/profile/14208-ashok-kumar-jain', 'https://jito.org/profile/14207-amit-jain-rathod', 'https://jito.org/profile/14206-amar-kumar-jain', 'https://jito.org/profile/14205-ravi-kothari', 'https://jito.org/profile/14204-moxesh-prakash-punamiya', 'https://jito.org/profile/14203-sourabh-kothari', 'https://jito.org/profile/14202-ramesh-kumar-singhvi', 'https://jito.org/profile/14201-ramesh-daglia', 'https://jito.org/profile/14200-rakesh-bhanawat', 'https://jito.org/profile/14199-pushpendra-nalwaya', 'https://jito.org/profile/14198-pritam-jain', 'https://jito.org/profile/14197-pramod-kumar-mehta', 'https://jito.org/profile/14196-narendra-kumar-jain', 'https://jito.org/profile/14195-mayank-patwa', 'https://jito.org/profile/14194-dharmendra-mandot', 'https://jito.org/profile/14193-bhanwar-lal-porwal', 'https://jito.org/profile/14192-ashok-kumar-porwal', 'https://jito.org/profile/14191-gajendra-kumar-shankar-lal-chandaliya', 'https://jito.org/profile/14190-girish-jain', 'https://jito.org/profile/14189-avinash-jain', 'https://jito.org/profile/14188-vijay-jain', 'https://jito.org/profile/14187-subhash-sancheti', 'https://jito.org/profile/14186-rajesh-kumar-golechha', 'https://jito.org/profile/14185-tejaswini-sudarshan-bafna', 'https://jito.org/profile/14184-swapnil-vilas-shah', 'https://jito.org/profile/14183-sudeep-vijay-chhallani', 'https://jito.org/profile/14182-sanjay-bansilal-chordiya', 'https://jito.org/profile/14181-preeti-manoj-chhajed', 'https://jito.org/profile/14180-prakash-javerchand-oswal', 'https://jito.org/profile/14179-kiran-bachulal-rathod', 'https://jito.org/profile/14178-devendra-mangilal-bhansali', 'https://jito.org/profile/14177-anand-nitinbhai-mehta', 'https://jito.org/profile/14176-surya-prakash-chopra', 'https://jito.org/profile/14175-sanjay-gemawat', 'https://jito.org/profile/14174-sangita-jain-jain-lunker', 'https://jito.org/profile/14173-sham-lal-jain', 'https://jito.org/profile/14172-sanjay-golecha', 'https://jito.org/profile/14171-manoj-kumar-jain', 'https://jito.org/profile/14170-yogesh-brijlalji-chopda', 'https://jito.org/profile/14169-bipin-r-shah-rasiklal-shah', 'https://jito.org/profile/14168-kalpesh-arvind-shah', 'https://jito.org/profile/14167-hemant-vishanji-dedhia', 'https://jito.org/profile/14166-manju-parasmal-golecha', 'https://jito.org/profile/14165-urmila-dilip-chandan', 'https://jito.org/profile/14164-ugamraj-misrimal-mehta', 'https://jito.org/profile/14163-surendra-madanmal-mehta', 'https://jito.org/profile/14162-shrenik-champalal-jain', 'https://jito.org/profile/14161-sanjay-c-jain', 'https://jito.org/profile/14160-ratan-tarachand-mehta', 'https://jito.org/profile/14159-ramesh-sumermal-nahar', 'https://jito.org/profile/14158-rajesh-kumar-bhagchand-mehta', 'https://jito.org/profile/14157-milapchand-bhimraj-mehta', 'https://jito.org/profile/14156-mahendra-nemichand-bafna', 'https://jito.org/profile/14155-mahendra-kumar-tarachand-mehta', 'https://jito.org/profile/14154-lalit-okhraj-bokadia', 'https://jito.org/profile/14153-lalit-champalal-jain', 'https://jito.org/profile/14152-lakhpatraj-bhagchandji-mehta', 'https://jito.org/profile/14151-kushboo-chirag-chandan', 'https://jito.org/profile/14150-jaswant-bhagchand-mehta']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM