[英]Unwanted CSV Output scraped from a website|Using Python and Selenium
我在尝试从中抓取数据的网站上的 CSV 导出结果遇到问题。
Output 问题:Output 在列中,但只是第一列,它只输出第一列数据
Output 成排,但只有一排
我只想要 output 的典型方式
这是整个站点 html 的一部分,我的特定目标是:
<tbody id="sitesList">
<tr data-value="11230" class="item-row">
<td class="text-left"><a href="www.example.com" target="_blank">example.com</a> <i class="fa fa-external-link"></i>
<br><span>» <a href="www.domain.com/site/11230.html" target="_blank" class="text-danger">view site details</a></span></td>
<td>92</td>
<td>71</td>
<td>Do Follow</td>
<td style="font-size:12px;font-family:sans-serif !important;">Education
<br>Family & Parenting
<br>Food & Drink
<br>
</td>
<td>Included</td>
<td><strong>$1</strong></td>
<td><span data-id="11230" class="btn btn-success btn-sm addtocart">Buy Website $1</span></td>
</tr>
<tr data-value="11229" class="item-row">
<td class="text-left"><a href="example1.com/" target="_blank">example1.com</a> <i class="fa fa-external-link"></i>
<br><span>» <a href="www.domain.com/site/11229.html" target="_blank" class="text-danger">view site details</a></span></td>
<td>65</td>
<td>34</td>
<td>Do Follow</td>
<td style="font-size:12px;font-family:sans-serif !important;">Business & Finance
<br>General: Multi-Niche
<br>
</td>
<td>Included</td>
<td><strong>$2</strong></td>
<td><span data-id="11229" class="btn btn-success btn-sm addtocart">Buy Website $2</span></td>
</tr>
<tr data-value="11228" class="item-row">
<td class="text-left"><a href="example2.com" target="_blank">example2.com</a> <i class="fa fa-external-link"></i>
<div class="tooltip owner_tooltip" style="float: right;opacity: 1;width: 20px;height: 20px;background-size: 100%;"><span class="tooltiptext">Owner Verified</span></div>
<br><span>» <a href="www.domain.com/site/11228.html" target="_blank" class="text-danger">view site details</a></span></td>
<td>27</td>
<td>26</td>
<td>Do Follow</td>
<td style="font-size:12px;font-family:sans-serif !important;">Cryptocurrency
<br>
</td>
<td>Not Included</td>
<td><strong>$3</strong></td>
<td><span data-id="11228" class="btn btn-success btn-sm addtocart">Buy Website $3</span></td>
</tr>
<tr data-value="11227" class="item-row">
<td class="text-left"><a href="example3.com" target="_blank">example3.com</a> <i class="fa fa-external-link"></i>
<br><span>» <a href="www.domain.com/site/11227.html" target="_blank" class="text-danger">view site details</a></span></td>
<td>23</td>
<td>29</td>
<td>Do Follow</td>
<td style="font-size:12px;font-family:sans-serif !important;">Business & Finance
<br>Health
<br>SEO & Digital Marketing
<br>
</td>
<td>Included</td>
<td><strong>$4</strong></td>
<td><span data-id="11227" class="btn btn-success btn-sm addtocart">Buy Website $4</span></td>
</tr>
</tbody>
我正在使用 selenium ,这是我的代码:
siteList_tds = driver.find_elements(By.XPATH, "//tbody[@id='sitesList']//tr//td")
with open('test.csv', 'w') as f:
write = csv.writer(f)
for s in siteList_tds:
write.writerow(s.text) ## or write.writerow([s.text])
我会尝试以下方法:
siteList_trs = driver.find_elements(By.XPATH, "//tbody[@id='sitesList']//tr")
with open('test.csv', 'w') as f:
write = csv.writer(f)
for r in siteList_trs:
href = r.find_element_by_xpath('.//a').get_attribute('href')
tds = r.find_elements_by_xpath('.//td')
data = []
for td in tds:
data.append(td.text)
write.writerow(data.insert(0, href))
这段代码的不同之处在于:
tr
标签而不是td
来迭代tr
的a
标签中获取href
属性td
元素的 resttd
以创建一个包含每个td.text
的列表data
列表写入一行,但在同一行的开头插入href
字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.