简体   繁体   English

遍历单击以查找包含链接的表单元格,并通过链接文本查找它,同时使用Selenium和python抓取数据

[英]Iterating over click for the table cells containing the link and finding it by link text while scraping data using selenium and python

I want to save all the scrape data from on click on Survey Number displaying the textarea containing the numbers in the CSV file ..From my code only the last pages having the survey numbers are saved in CSV file .Other data of page1 and page2 not able to save 我要保存所有的刮擦数据,方法是单击“调查号”以显示包含CSV文件中数字的文本区域..从我的代码中,只有具有调查号的最后几页保存在CSV文件中。能够保存

enter code here 在这里输入代码

links = driver.find_elements_by_link_text('SurveyNo')
page_num = 2
while True:
      link1 = driver.find_element_by_link_text(str(page_num))
      link1.click()
      soup=BeautifulSoup(driver.page_source, 'lxml')
      table = soup.find("table" , attrs = 
      {'id':'ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate' })
      data = []
      for l in range(len(links)):

           newlinks = driver.find_elements_by_link_text('SurveyNo')
           newlinks[l].click()
           soup = BeautifulSoup(driver.page_source, 'lxml')
           td1 = soup.find("textarea", attrs={'class': 'textbox'})
           data.append(td1.text)
           for data1 in data:
                with open(os.path.expanduser("output.csv"),"w") as result:
                wr = csv.writer(result, dialect='excel')
                wr.writerow(data)
      page_num += 1

Output: Please see the image Outfile image 输出:请参见图像输出文件图像

I have done a bit modification.instead of table I have taken the links count and then iterate the links.Let me know if this work for you. 我做了一些修改,而不是表,我已经计算了链接数,然后迭代了这些链接。让我知道这是否对您有用。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.select import Select
URL = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Pune'
driver = webdriver.Chrome()
driver.get(URL)
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('5')
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1853')

links = driver.find_elements_by_link_text('SurveyNo')

for l in range(len(links)):
    newlinks = driver.find_elements_by_link_text('SurveyNo')
    newlinks[l].click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    td1 = soup.find("textarea", attrs={'class': 'textbox'})
    print(td1.text)

OutPut: 输出:

1239 , 1520 , 1524 , 1231 , 1223 , 1339 , 1517 , 1518 , 1521 , 1523 , 1528 , 1540 , 1244 , 1519 , 1522 , 1243 , 1186 , 1187 , 1224 , 1227 , 1234 , 1235 , 1238 , 1241 , 1228 
1646 , 1504 , 1527 , 1532 , 1536 , 1541 , 2471 , 1641 , 2463 , 1691 , 1751 , 1755 , 1770 , 1915 , 1929 , 1634 , 1750 , 1771 , 1769 , 1767 , 1766 , 1754 , 1694 , 1752 , 1913 , 2468 , 1753 , 1914 , 1916 , 1919 , 1928 , 1693 , 2467 , 2028 , 2472 , 2473 , 2474 , 2462 , 1539 , 2464 , 1529 , 1530 , 1531 , 1533 , 1506 , 1534 , 1505 , 1535 , 1649 , 1538 , 1507 , 1547 , 1603 , 1604 , 1635 , 1636 , 1640 , 1642 , 1645 , 1647 , 1648 , 1537 
1620 , 1621 
1501 , 1482 , 1548 , 1544 , 1513 , 1497 , 1681 , 1677 , 1673 , 1552 , 1486 , 1525 , 1478 , 1474 , 1470 , 1466 , 1463 , 1459 , 1493 , 1490 , 1579 , 1669 , 1665 , 1662 , 1658 , 1608 , 1602 , 1599 , 1595 , 1591 , 1509 , 1583 , 1556 , 1575 , 1572 , 1568 , 1564 , 1560 , 1587 , 1574 , 1592 , 1593 , 1594 , 1596 , 1597 , 1598 , 1615 , 1600 , 1601 , 1590 , 1605 , 1606 , 1607 , 1609 , 1610 , 1611 , 1612 , 1613 , 1614 , 1576 , 1667 , 1616 , 1565 , 1566 , 1567 , 1569 , 1570 , 1577 , 1573 , 1589 , 1578 , 1580 , 1581 , 1582 , 1584 , 1585 , 1586 , 1588 , 1571 , 1671 , 1656 , 1657 , 1659 , 1660 , 1661 , 1663 , 1666 , 1664 , 1670 , 1653 , 1672 , 1674 , 1675 , 1676 , 1678 , 1679 , 1680 , 1563 , 1668 , 1633 , 1618 , 1619 , 1622 , 1623 , 1624 , 1625 , 1629 , 1630 , 1655 , 1632 , 1654 , 1638 , 1643 , 1644 , 1457 , 1650 , 1651 , 1652 , 1617 , 1631 , 1467 , 1479 , 1455 , 1456 , 1340 , 1458 , 1461 , 1462 , 1453 , 1465 , 1452 , 1468 , 1469 , 1471 , 1472 , 1473 , 1475 , 1476 , 1477 , 1464 , 1352 , 1562 , 1460 , 1341 , 1344 , 1345 , 1346 , 1348 , 1454 , 1351 , 1349 , 1353 , 1445 , 1446 , 1447 , 1448 , 1449 , 1450 , 1451 , 1350 , 1550 , 1514 , 1515 , 1516 , 1526 , 1542 , 1543 , 1545 , 1512 , 1549 , 1554 , 1551 , 1553 , 1555 , 1558 , 1559 , 1561 , 1480 , 1347 , 1546 , 1488 , 1481 , 1483 , 1557 , 1511 , 1484 , 1485 , 1487 , 1489 , 1491 , 1492 , 1498 , 1503 , 1494 , 1500 , 1502 , 1499 , 1508 , 1496 , 1510 , 1495 
1121 , 1125 , 1129 , 1133 , 1110 , 1083 , 1118 , 1114 , 1106 , 1102 , 1098 , 1094 , 1137 , 1087 , 1225 , 1091 , 1232 , 1289 , 1286 , 1282 , 1278 , 1274 , 1270 , 1266 , 1262 , 1259 , 1255 , 1251 , 1216 , 1240 , 1141 , 1071 , 1220 , 1212 , 1208 , 1204 , 1200 , 1196 , 1193 , 1189 , 1152 , 1148 , 1145 , 1247 , 1745 , 1718 , 1706 , 1710 , 1722 , 1725 , 1729 , 1733 , 1698 , 1741 , 1695 , 1749 , 1758 , 1762 , 1768 , 1714 , 1075 , 1079 , 1737 , 1301 , 1064 , 1702 , 1688 , 1067 , 1060 , 1297 , 1305 , 1309 , 1313 , 1316 , 1320 , 1293 , 1311 , 1302 , 1312 , 1308 , 1307 , 1310 , 1306 , 1299 , 1303 , 1300 , 1315 , 1328 , 1298 , 1296 , 1295 , 1304 , 1327 , 1334 , 1333 , 1264 , 1294 , 1332 , 1331 , 1326 , 1329 , 1318 , 1325 , 1324 , 1323 , 1322 , 1321 , 1319 , 1330 , 1254 , 1267 , 1265 , 1263 , 1261 , 1260 , 1258 , 1269 , 1256 , 1271 , 1253 , 1252 , 1250 , 1249 , 1335 , 1726 , 1257 , 1280 , 1291 , 1290 , 1288 , 1287 , 1285 , 1284 , 1268 , 1281 , 1292 , 1279 , 1277 , 1276 , 1275 , 1273 , 1272 , 1283 , 1314 , 1716 , 1735 , 1734 , 1732 , 1731 , 1730 , 1738 , 1727 , 1739 , 1724 , 1248 , 1721 , 1720 , 1719 , 1723 , 1728 , 1748 , 1765 , 1764 , 1763 , 1761 , 1760 , 1759 , 1736 , 1756 , 1715 , 1747 , 1746 , 1744 , 1743 , 1742 , 1740 , 1757 , 1628 , 1717 , 1686 , 1685 , 1684 , 1683 , 1682 , 1689 , 1637 , 1690 , 1627 , 1626 , 1343 , 1342 , 1338 , 1337 , 1639 , 1703 , 1713 , 1712 , 1711 , 1709 , 1708 , 1707 , 1687 , 1704 , 1336 , 1701 , 1700 , 1699 , 1697 , 1696 , 1692 , 1705 , 1112 , 1147 , 1122 , 1120 , 1119 , 1117 , 1116 , 1124 , 1113 , 1126 , 1111 , 1109 , 1108 , 1107 , 1105 , 1104 , 1115 , 1135 , 1146 , 1144 , 1143 , 1142 , 1140 , 1139 , 1123 , 1136 , 1100 , 1134 , 1132 , 1131 , 1130 , 1128 , 1127 , 1138 , 1062 , 1073 , 1072 , 1070 , 1069 , 1068 , 1066 , 1103 , 1063 , 1077 , 1061 , 1059 , 1057 , 1058 , 1246 , 1317 , 1065 , 1086 , 1090 , 1099 , 1097 , 1096 , 1095 , 1093 , 1074 , 1089 , 1076 , 1085 , 1084 , 1082 , 1081 , 1080 , 1078 , 1101 , 1092 , 1198 , 1209 , 1207 , 1206 , 1205 , 1203 , 1202 , 1188 , 1199 , 1213 , 1197 , 1195 , 1194 , 1192 , 1191 , 1190 , 1201 , 1229 , 1245 , 1242 , 1237 , 1236 , 1088 , 1149 , 1210 , 1230 , 1211 , 1226 , 1222 , 1221 , 1219 , 1218 , 1215 , 1214 , 1233 , 1185 , 1166 , 1165 , 1164 , 1163 , 1162 , 1160 , 1167 , 1157 , 1161 , 1217 , 1156 , 1155 , 1154 , 1153 , 1151 , 1150 , 1158 , 1179 , 1177 , 1175 , 1159 , 1178 , 1176 , 1174 , 1173 , 1169 , 1168 , 1171 , 1181 , 1170 , 1180 , 1172 

I think your mistake is possibly that link returns as a list and then you only tell it to click once. 我认为您的错误可能是link返回为列表,然后您只告诉它单击一次。 You may need another loop along the lines of 您可能需要沿着

for x in link;
    x.click();
soup = BeautifulSoup(driver.page_source, 'lxml')
....

(and you may want to rename your variable as links so you can say for link in links ) (并且您可能想将变量重命名为links以便可以for link in links说出for link in links

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM