I'm trying to scrape some hidden tables (15 tables per page) which are expanded after clicking an arrow. (I'm attaching pictures: Unexpanded tables Expanded tables )
I'm attaching the HTML, too (sorry, it's a bit long)
<table class="footable table toggle-arrow-tiny default breakpoint footable-loaded" transparenturl="Images/arrow_none.gif" ascendingurl="Images/arrow_up.gif" customsortdirection="Ascending" custompageindex="0" customsortfield="fullname" custompagealphaindex="A" custompagemode="ABC" custompagealpharelative="A" descendingurl="Images/arrow_down.gif" customvirtualcount="1605" id="MainContent_gw_partners" style="border-collapse:collapse;" cellspacing="0">
<thead>
<tr>
<th data-toggle="true" scope="col" class="footable-visible footable-first-column"> </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible"> </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">Titolo </th><th scope="col" class="footable-visible">Cognome </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">NPA </th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible">Luogo </th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible footable-last-column">Cantone </th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s) </th><th data-hide="all" scope="col" style="display: none;">Società </th><th data-hide="all" scope="col" style="display: none;">Cognome </th><th data-hide="all" scope="col" style="display: none;">C/O </th><th data-hide="all" scope="col" style="display: none;">Via </th><th data-hide="all" scope="col" style="display: none;">NPA </th><th data-hide="all" scope="col" style="display: none;">Luogo </th><th data-hide="all" scope="col" style="display: none;">Tel / Cellulare </th><th data-hide="all" scope="col" style="display: none;">Cellulare </th><th data-hide="all" scope="col" style="display: none;">Fax </th><th data-hide="all" scope="col" style="display: none;">e-mail </th><th data-hide="all" scope="col" style="display: none;">Sito WEB </th><th data-hide="all" scope="col" style="display: none;">Altri luoghi di lavoro </th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s) </th>
</tr>
</thead><tbody>
<tr class="row_white footable-detail-show">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
</td><td class="footable-visible"> </td><td class="footable-visible">
ABBONDANZIERI Katia
</td><td class="footable-visible">
1204
<br>
</td><td class="footable-visible">
Genève
<br>
</td><td class="footable-visible footable-last-column">
GE
<br>
</td><td style="display: none;">
197. Omeopatia, 202. Linfodrenaggio manuale, 205. Massaggio classico, 664. Riflessoterapia generale
</td><td style="display: none;">
</td><td style="display: none;">
ABBONDANZIERI Katia
</td><td style="display: none;">
</td><td style="display: none;">
Place du Cirque, 2
</td><td style="display: none;">
1204
</td><td style="display: none;">
Genève
</td><td style="display: none;">
022 328 23 44
</td><td style="display: none;">
079 601 92 75
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
<div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div>
</td>
</tr><tr class="footable-row-detail" style="display: table-row;"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">197. Omeopatia, 202. Linfodrenaggio manuale, 205. Massaggio classico, 664. Riflessoterapia generale</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABBONDANZIERI Katia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Place du Cirque, 2</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1204</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Genève</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Tel / Cellulare:</div><div class="footable-row-detail-value">022 328 23 44</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">079 601 92 75</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div></div></div></div></td></tr><tr class="row_grey footable-detail-show">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
<a href="http://www.kinesiopourtous.ch" target="_blank">
<img title="Link internet" alt="" style="MARGIN-RIGHT: 7px" src="Images/pictoSiteInternet.jpg" width="12" height="12" border="0">
</a>
</td><td class="footable-visible"> </td><td class="footable-visible">
<img id="MainContent_gw_partners_img1_1" src="Images/multi.gif">
ABEGG Sophie
</td><td class="footable-visible">
1212
<br>
1875<br>
</td><td class="footable-visible">
Grand-Lancy
<br>
<nobr>Morgins</nobr><nobr><br>
</nobr></td><td class="footable-visible footable-last-column">
GE
<br>
VS<br>
</td><td style="display: none;">
199. Kinesiologia
</td><td style="display: none;">
Kinéso pour tous
</td><td style="display: none;">
ABEGG Sophie
</td><td style="display: none;">
</td><td style="display: none;">
Rue du Bachet 8
</td><td style="display: none;">
1212
</td><td style="display: none;">
Grand-Lancy
</td><td style="display: none;">
</td><td style="display: none;">
076 365 63 86
</td><td style="display: none;">
</td><td style="display: none;">
<a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
</a>
</td><td style="display: none;">
<a href="http://www.kinesiopourtous.ch" target="_blank">
www.kinesiopourtous.ch
</a>
</td><td style="display: none;">
Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br>
</td><td style="display: none;">
<div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div>
</td>
</tr><tr class="footable-row-detail"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">199. Kinesiologia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Società:</div><div class="footable-row-detail-value">Kinéso pour tous</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABEGG Sophie</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Rue du Bachet 8</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1212</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Grand-Lancy</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">076 365 63 86</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">e-mail:</div><div class="footable-row-detail-value"><a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
</a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Sito WEB:</div><div class="footable-row-detail-value"><a href="http://www.kinesiopourtous.ch" target="_blank">
www.kinesiopourtous.ch
</a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Altri luoghi di lavoro:</div><div class="footable-row-detail-value">Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div></div></div></div></td></tr><tr class="row_white">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
So I'm using Selenium to click and BeautifulSoup 4 to scrape tables.
I would like to create a loop to click each arrow (15 arrows in each page) and scrape the data from each table (13 rows in each table. If data is missing the cell should blank in the outputed excel file).
Any help, please?
If you inspect, you can see it's Request Method: POST so used a different method.
If you'd prefer to still use selenium, just let me know and I can try to work that way out too.
You're going to need to go grab the Form Data, and copy that into the payload dictionary. I did not include the whole thing, because it's just too long, but I included a snipit of it in the code so you could see the format.
Then I just used pandas to grab the table with the data.
import requests
import bs4
import pandas as pd
url = 'http://www.asca.ch/Partners.aspx?lang=it'
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Length': '55755',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': '_ga=GA1.2.1140629371.1547917375; _gid=GA1.2.1588639047.1547917375; ASP.NET_SessionId=fmxjh5jxwuq10awmqch1ztjz; __AntiXsrfToken=1d9c575ab1494ab29d2e796e2853eaac; _gat=1',
'Host': 'www.asca.ch',
'Origin': 'http://www.asca.ch',
'Referer': 'http://www.asca.ch/Partners.aspx?lang=it',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'X-MicrosoftAjax': 'Delta=true',
'X-Requested-With': 'XMLHttpRequest'}
payload = {
'ctl00$RadScriptManagerMaster': 'ctl00$RadScriptManagerMaster|ctl00$MainContent$btn_submit',
'RadStyleSheetManager1_TSSM': ';|636398747139118389:c7e0c438;|636304438089400012:39e38b4c;|636304438089880540:19119943;|636304438090200892:b81c9af7;|636304438090180870:bb009068;|636304438089390001:e78ed9b3;|636325253237635520:dedafabf;|636304438089530155:5961cfc1;|636304438090290991:d08fa23c;|636304438089530155:7fafd27a',
'RadScriptManagerMaster_TSM': ';;System.Web.Extensions, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35:en-US:af7dd01d-1544-48f6-a85d-1285ae370050:ea597d4b:b25378d2;||:460a097d:7a38c288:ace9a216;Telerik.Web.UI, Version=2014.1.403.40, Culture=neutral, PublicKeyToken=121fae78165ba3d4:en-US:ca584452-327f-4858-bf00-fb22c6f6fd75:16e4e7cd:ed16cbdc:f7645509:24ee1bba:f46195d3:2003d0b8:88144a7a:1e771326:aa288e2d:258f1c72:7165f74;',
'ctl00$MainContent$ddl_partners':'' ,
'ctl00_MainContent_ddl_partners_ClientState':'' ,
'ctl00$MainContent$ddl_countries': 'Suisse',
'ctl00_MainContent_ddl_countries_ClientState': '',
'ctl00$MainContent$ddl_cantons': 'GE',
...
...
'__ASYNCPOST': 'true',
'RadAJAXControlID': 'ctl00_MainContent_RadAjaxManager1'
}
r = requests.post(url, headers=headers, data=payload)
soup = r.text
tables = pd.read_html(r.text)
data = tables[0]
Output:
print (data)
Unnamed: 0 ... Discipline(s) thérapeutique(s).1
0 NaN ... METHODES DE MASSAGELinfodrenaggio manualeMassa...
1 NaN ... METHODES ENERGETIQUES MANUELLESKinesiologia
2 NaN ... METHODES DE MASSAGEMassaggio classico
3 NaN ... METHODES AYURVEDIQUESHatha YogaMETHODES PSYCHO...
4 NaN ... METHODES DE MASSAGEMassaggio classicoMETHODES ...
5 NaN ... METHODES PRESCRIPTIVESOmeopatia
6 NaN ... METHODES ENERGETIQUES MANUELLESReikiMETHODES O...
7 NaN ... METHODES DE MASSAGEMassaggio tradizionale thai...
8 NaN ... METHODES DE MASSAGEMassaggio classicoMassaggio...
9 NaN ... METHODES DE MASSAGEMassaggio empirico
10 NaN ... METHODES PSYCHOLOGIQUES COMPLEMENTAIRESConsigl...
11 NaN ... METHODES PRESCRIPTIVESConsigli dietetici (MCO)...
12 NaN ... METHODES DE MASSAGEMassaggio classicoMassaggio...
13 NaN ... METHODES DE MASSAGEMassaggio terapeutico
14 NaN ... METHODES DE MASSAGELinfodrenaggio manualeMETHO...
[15 rows x 21 columns]
Selenium way to expand those tables. There is a better way to handle the tie it takes to load, but just wanted to get this to you asap, so just went with a time.sleep
from selenium import webdriver
import time
url = 'http://www.asca.ch/Partners.aspx?lang=it'
driver = webdriver.Chrome()
driver.get(url)
# Click the dropdown, select GE, click Confermo, click Ricerca
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_Arrow"]').click()
time.sleep(2)
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_DropDown"]/div/ul/li[9]').click()
driver.find_element_by_xpath('//*[@id="MainContent__chkDisclaimer"]').click()
driver.find_element_by_xpath('//*[@id="MainContent_btn_submit"]').click()
time.sleep(5)
#Function to Expand Tables
def expand_tables():
rows = driver.find_elements_by_xpath('//*[@id="MainContent_gw_partners"]/tbody/tr')
for row in rows:
row.click()
# Function to Click Next Page
def click_next_page():
driver.find_element_by_xpath('//*[@id="MainContent_btnNextPackId"]').click()
page = 1
num_of_pages = True
while num_of_pages == True:
print ('Page: %s' %page)
expand_tables()
## Your code to Parse the Tables ##
try:
click_next_page()
page += 1
except:
print ('You are at the end')
time.sleep(2)
# When finished
driver.close()
Sorry, I couldn't fit my code to the comments, so I'm posting as an answer.
This is my code for parsing tables:
# To find all the tables
table = soup.find('table', {'class': 'footable'})
# To get all rows in that table
rows = table.find_all('tr')
# A function to process each row
def processRow(row):
#All rows with hidden data
dataFields = row.find_all('td', {'style': True}
output = {}
#Fixed index numbers are not ideal but in this case will work
output['Discipline'] = dataFields[0].text
output['Cogome'] = dataFields[2].text
output['Cellulare'] = dataFields[8].text
output['email'] = dataFields[10].text
return output
# Declaring a list to store all results
results = []
# Iterating over all the rows and storing the processed result in a list
for row in rows:
results.append(processRow(row))
print(results)
click_next_page()
time.sleep(3)
count += 1
I think something is not ok. I get a "SyntaxError: invalid syntax" at "output = {}" below # A function to process each row.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.