繁体   English   中英

从网站上抓取的不需要的 CSV Output|使用 Python 和 ZC49DFBZ55F06BB406E38C2C

[英]Unwanted CSV Output scraped from a website|Using Python and Selenium

我在尝试从中抓取数据的网站上的 CSV 导出结果遇到问题。

Output 问题:Output 在列中,但只是第一列,它只输出第一列数据文本

Output 成排,但只有一排文本

我只想要 output 的典型方式文本

这是整个站点 html 的一部分,我的特定目标是:

<tbody id="sitesList">
    <tr data-value="11230" class="item-row">
        <td class="text-left"><a href="www.example.com" target="_blank">example.com</a> <i class="fa fa-external-link"></i>
            <br><span>» <a href="www.domain.com/site/11230.html" target="_blank" class="text-danger">view site details</a></span></td>
        <td>92</td>
        <td>71</td>
        <td>Do Follow</td>
        <td style="font-size:12px;font-family:sans-serif !important;">Education
            <br>Family &amp; Parenting
            <br>Food &amp; Drink
            <br>
        </td>
        <td>Included</td>
        <td><strong>$1</strong></td>
        <td><span data-id="11230" class="btn btn-success btn-sm addtocart">Buy Website $1</span></td>
    </tr>
    <tr data-value="11229" class="item-row">
        <td class="text-left"><a href="example1.com/" target="_blank">example1.com</a> <i class="fa fa-external-link"></i>
            <br><span>» <a href="www.domain.com/site/11229.html" target="_blank" class="text-danger">view site details</a></span></td>
        <td>65</td>
        <td>34</td>
        <td>Do Follow</td>
        <td style="font-size:12px;font-family:sans-serif !important;">Business &amp; Finance
            <br>General: Multi-Niche
            <br>
        </td>
        <td>Included</td>
        <td><strong>$2</strong></td>
        <td><span data-id="11229" class="btn btn-success btn-sm addtocart">Buy Website $2</span></td>
    </tr>
    <tr data-value="11228" class="item-row">
        <td class="text-left"><a href="example2.com" target="_blank">example2.com</a> <i class="fa fa-external-link"></i>
            <div class="tooltip owner_tooltip" style="float: right;opacity: 1;width: 20px;height: 20px;background-size: 100%;"><span class="tooltiptext">Owner Verified</span></div>
            <br><span>» <a href="www.domain.com/site/11228.html" target="_blank" class="text-danger">view site details</a></span></td>
        <td>27</td>
        <td>26</td>
        <td>Do Follow</td>
        <td style="font-size:12px;font-family:sans-serif !important;">Cryptocurrency
            <br>
        </td>
        <td>Not Included</td>
        <td><strong>$3</strong></td>
        <td><span data-id="11228" class="btn btn-success btn-sm addtocart">Buy Website $3</span></td>
    </tr>
    <tr data-value="11227" class="item-row">
        <td class="text-left"><a href="example3.com" target="_blank">example3.com</a> <i class="fa fa-external-link"></i>
            <br><span>» <a href="www.domain.com/site/11227.html" target="_blank" class="text-danger">view site details</a></span></td>
        <td>23</td>
        <td>29</td>
        <td>Do Follow</td>
        <td style="font-size:12px;font-family:sans-serif !important;">Business &amp; Finance
            <br>Health
            <br>SEO &amp; Digital Marketing
            <br>
        </td>
        <td>Included</td>
        <td><strong>$4</strong></td>
        <td><span data-id="11227" class="btn btn-success btn-sm addtocart">Buy Website $4</span></td>
    </tr>
</tbody>

我正在使用 selenium ,这是我的代码:

siteList_tds = driver.find_elements(By.XPATH, "//tbody[@id='sitesList']//tr//td")
with open('test.csv', 'w') as f:
    write = csv.writer(f)
    for s in siteList_tds:
        write.writerow(s.text) ## or write.writerow([s.text])

我会尝试以下方法:

siteList_trs = driver.find_elements(By.XPATH, "//tbody[@id='sitesList']//tr")
with open('test.csv', 'w') as f:
    write = csv.writer(f)
    for r in siteList_trs:
        href = r.find_element_by_xpath('.//a').get_attribute('href')
        tds = r.find_elements_by_xpath('.//td')
        data = []
        for td in tds:
            data.append(td.text)
        write.writerow(data.insert(0, href))

这段代码的不同之处在于:

  1. 找到tr标签而不是td来迭代
  2. tra标签中获取href属性
  3. 获取该行的td元素的 rest
  4. 遍历td以创建一个包含每个td.text的列表
  5. 最后,将data列表写入一行,但在同一行的开头插入href字符串。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM