简体   繁体   中英

Iterating over click for the table cells containing the link and finding it by link text while scraping data using selenium and python

I want to save all the scrape data from on click on Survey Number displaying the textarea containing the numbers in the CSV file ..From my code only the last pages having the survey numbers are saved in CSV file .Other data of page1 and page2 not able to save

enter code here

links = driver.find_elements_by_link_text('SurveyNo')
page_num = 2
while True:
      link1 = driver.find_element_by_link_text(str(page_num))
      link1.click()
      soup=BeautifulSoup(driver.page_source, 'lxml')
      table = soup.find("table" , attrs = 
      {'id':'ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate' })
      data = []
      for l in range(len(links)):

           newlinks = driver.find_elements_by_link_text('SurveyNo')
           newlinks[l].click()
           soup = BeautifulSoup(driver.page_source, 'lxml')
           td1 = soup.find("textarea", attrs={'class': 'textbox'})
           data.append(td1.text)
           for data1 in data:
                with open(os.path.expanduser("output.csv"),"w") as result:
                wr = csv.writer(result, dialect='excel')
                wr.writerow(data)
      page_num += 1

Output: Please see the image Outfile image

I have done a bit modification.instead of table I have taken the links count and then iterate the links.Let me know if this work for you.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.select import Select
URL = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Pune'
driver = webdriver.Chrome()
driver.get(URL)
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('5')
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1853')

links = driver.find_elements_by_link_text('SurveyNo')

for l in range(len(links)):
    newlinks = driver.find_elements_by_link_text('SurveyNo')
    newlinks[l].click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    td1 = soup.find("textarea", attrs={'class': 'textbox'})
    print(td1.text)

OutPut:

1239 , 1520 , 1524 , 1231 , 1223 , 1339 , 1517 , 1518 , 1521 , 1523 , 1528 , 1540 , 1244 , 1519 , 1522 , 1243 , 1186 , 1187 , 1224 , 1227 , 1234 , 1235 , 1238 , 1241 , 1228 
1646 , 1504 , 1527 , 1532 , 1536 , 1541 , 2471 , 1641 , 2463 , 1691 , 1751 , 1755 , 1770 , 1915 , 1929 , 1634 , 1750 , 1771 , 1769 , 1767 , 1766 , 1754 , 1694 , 1752 , 1913 , 2468 , 1753 , 1914 , 1916 , 1919 , 1928 , 1693 , 2467 , 2028 , 2472 , 2473 , 2474 , 2462 , 1539 , 2464 , 1529 , 1530 , 1531 , 1533 , 1506 , 1534 , 1505 , 1535 , 1649 , 1538 , 1507 , 1547 , 1603 , 1604 , 1635 , 1636 , 1640 , 1642 , 1645 , 1647 , 1648 , 1537 
1620 , 1621 
1501 , 1482 , 1548 , 1544 , 1513 , 1497 , 1681 , 1677 , 1673 , 1552 , 1486 , 1525 , 1478 , 1474 , 1470 , 1466 , 1463 , 1459 , 1493 , 1490 , 1579 , 1669 , 1665 , 1662 , 1658 , 1608 , 1602 , 1599 , 1595 , 1591 , 1509 , 1583 , 1556 , 1575 , 1572 , 1568 , 1564 , 1560 , 1587 , 1574 , 1592 , 1593 , 1594 , 1596 , 1597 , 1598 , 1615 , 1600 , 1601 , 1590 , 1605 , 1606 , 1607 , 1609 , 1610 , 1611 , 1612 , 1613 , 1614 , 1576 , 1667 , 1616 , 1565 , 1566 , 1567 , 1569 , 1570 , 1577 , 1573 , 1589 , 1578 , 1580 , 1581 , 1582 , 1584 , 1585 , 1586 , 1588 , 1571 , 1671 , 1656 , 1657 , 1659 , 1660 , 1661 , 1663 , 1666 , 1664 , 1670 , 1653 , 1672 , 1674 , 1675 , 1676 , 1678 , 1679 , 1680 , 1563 , 1668 , 1633 , 1618 , 1619 , 1622 , 1623 , 1624 , 1625 , 1629 , 1630 , 1655 , 1632 , 1654 , 1638 , 1643 , 1644 , 1457 , 1650 , 1651 , 1652 , 1617 , 1631 , 1467 , 1479 , 1455 , 1456 , 1340 , 1458 , 1461 , 1462 , 1453 , 1465 , 1452 , 1468 , 1469 , 1471 , 1472 , 1473 , 1475 , 1476 , 1477 , 1464 , 1352 , 1562 , 1460 , 1341 , 1344 , 1345 , 1346 , 1348 , 1454 , 1351 , 1349 , 1353 , 1445 , 1446 , 1447 , 1448 , 1449 , 1450 , 1451 , 1350 , 1550 , 1514 , 1515 , 1516 , 1526 , 1542 , 1543 , 1545 , 1512 , 1549 , 1554 , 1551 , 1553 , 1555 , 1558 , 1559 , 1561 , 1480 , 1347 , 1546 , 1488 , 1481 , 1483 , 1557 , 1511 , 1484 , 1485 , 1487 , 1489 , 1491 , 1492 , 1498 , 1503 , 1494 , 1500 , 1502 , 1499 , 1508 , 1496 , 1510 , 1495 
1121 , 1125 , 1129 , 1133 , 1110 , 1083 , 1118 , 1114 , 1106 , 1102 , 1098 , 1094 , 1137 , 1087 , 1225 , 1091 , 1232 , 1289 , 1286 , 1282 , 1278 , 1274 , 1270 , 1266 , 1262 , 1259 , 1255 , 1251 , 1216 , 1240 , 1141 , 1071 , 1220 , 1212 , 1208 , 1204 , 1200 , 1196 , 1193 , 1189 , 1152 , 1148 , 1145 , 1247 , 1745 , 1718 , 1706 , 1710 , 1722 , 1725 , 1729 , 1733 , 1698 , 1741 , 1695 , 1749 , 1758 , 1762 , 1768 , 1714 , 1075 , 1079 , 1737 , 1301 , 1064 , 1702 , 1688 , 1067 , 1060 , 1297 , 1305 , 1309 , 1313 , 1316 , 1320 , 1293 , 1311 , 1302 , 1312 , 1308 , 1307 , 1310 , 1306 , 1299 , 1303 , 1300 , 1315 , 1328 , 1298 , 1296 , 1295 , 1304 , 1327 , 1334 , 1333 , 1264 , 1294 , 1332 , 1331 , 1326 , 1329 , 1318 , 1325 , 1324 , 1323 , 1322 , 1321 , 1319 , 1330 , 1254 , 1267 , 1265 , 1263 , 1261 , 1260 , 1258 , 1269 , 1256 , 1271 , 1253 , 1252 , 1250 , 1249 , 1335 , 1726 , 1257 , 1280 , 1291 , 1290 , 1288 , 1287 , 1285 , 1284 , 1268 , 1281 , 1292 , 1279 , 1277 , 1276 , 1275 , 1273 , 1272 , 1283 , 1314 , 1716 , 1735 , 1734 , 1732 , 1731 , 1730 , 1738 , 1727 , 1739 , 1724 , 1248 , 1721 , 1720 , 1719 , 1723 , 1728 , 1748 , 1765 , 1764 , 1763 , 1761 , 1760 , 1759 , 1736 , 1756 , 1715 , 1747 , 1746 , 1744 , 1743 , 1742 , 1740 , 1757 , 1628 , 1717 , 1686 , 1685 , 1684 , 1683 , 1682 , 1689 , 1637 , 1690 , 1627 , 1626 , 1343 , 1342 , 1338 , 1337 , 1639 , 1703 , 1713 , 1712 , 1711 , 1709 , 1708 , 1707 , 1687 , 1704 , 1336 , 1701 , 1700 , 1699 , 1697 , 1696 , 1692 , 1705 , 1112 , 1147 , 1122 , 1120 , 1119 , 1117 , 1116 , 1124 , 1113 , 1126 , 1111 , 1109 , 1108 , 1107 , 1105 , 1104 , 1115 , 1135 , 1146 , 1144 , 1143 , 1142 , 1140 , 1139 , 1123 , 1136 , 1100 , 1134 , 1132 , 1131 , 1130 , 1128 , 1127 , 1138 , 1062 , 1073 , 1072 , 1070 , 1069 , 1068 , 1066 , 1103 , 1063 , 1077 , 1061 , 1059 , 1057 , 1058 , 1246 , 1317 , 1065 , 1086 , 1090 , 1099 , 1097 , 1096 , 1095 , 1093 , 1074 , 1089 , 1076 , 1085 , 1084 , 1082 , 1081 , 1080 , 1078 , 1101 , 1092 , 1198 , 1209 , 1207 , 1206 , 1205 , 1203 , 1202 , 1188 , 1199 , 1213 , 1197 , 1195 , 1194 , 1192 , 1191 , 1190 , 1201 , 1229 , 1245 , 1242 , 1237 , 1236 , 1088 , 1149 , 1210 , 1230 , 1211 , 1226 , 1222 , 1221 , 1219 , 1218 , 1215 , 1214 , 1233 , 1185 , 1166 , 1165 , 1164 , 1163 , 1162 , 1160 , 1167 , 1157 , 1161 , 1217 , 1156 , 1155 , 1154 , 1153 , 1151 , 1150 , 1158 , 1179 , 1177 , 1175 , 1159 , 1178 , 1176 , 1174 , 1173 , 1169 , 1168 , 1171 , 1181 , 1170 , 1180 , 1172 

I think your mistake is possibly that link returns as a list and then you only tell it to click once. You may need another loop along the lines of

for x in link;
    x.click();
soup = BeautifulSoup(driver.page_source, 'lxml')
....

(and you may want to rename your variable as links so you can say for link in links )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM