繁体   English   中英

python web 刮 csv 文件

[英]python web scraping csv file

这是我的 web 抓取代码,用于获取内容并导出到 csv 文件。 我可以知道为什么 csv 文件中的每一行都会有间距吗? 能解决吗? 谢谢!

Python代码

import requests
from bs4 import BeautifulSoup
import csv

session = requests.session()

payload = {"i0023":"XXXXXX", 
          "i0025":"XXXXXX"
         }
         
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)

s = session.get("http://192.168.XXX.XXX/m_departmentid.html")

soup = BeautifulSoup(s.text, "html.parser")

table = soup.find('div', attrs={ "class" : "ItemListComponent"})
tbody = table.find_all('tbody')

rows = []

for row in table.find_all('tr'):
    rows.append([val.text for val in row.find_all('td')[0:6]])

with open('test.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(row for row in rows if row)

源代码

<div class="ItemListComponent">
<table>
<thead>
<tr><th rowspan="3" scope="col">Department ID</th><th colspan="5" scope="col">Page Total/Page Restriction</th><th rowspan="3" scope="col"></th></tr>
<tr><th colspan="3" scope="col">Total Prints</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th></tr>
<tr><th colspan="1" scope="col">Total</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th><th colspan="1" scope="col">Print</th><th colspan="1" scope="col">Print</th></tr>

</thead>
<tbody>
<tr><td>7654321</td><td>11</td><td>0</td><td>11</td><td>0</td><td>11</td><td></td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=100">0000100</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(100)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(100)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=101">0000101</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(101)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(101)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=102">0000102</a></td><td>18</td><td>5</td><td>13</td><td>5</td><td>13</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(102)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(102)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=103">0000103</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(103)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(103)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=104">0000104</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(104)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(104)" />
</td></tr>

图1 1

您将其打开为“wb”,即写入字节。 改为“w”打开它。

您需要对字符串进行编码以将其转换为字节 object。

for row in soup.select(".ItemListComponent tbody tr")[1:215]:
    row_text = [x.text.encode() for x in row.find_all("td")]
    print(",".join(row_text))

感谢大家。 最后,我找到了解决在 csv 编写器中添加换行参数缺失的问题的解决方案。

代码

session = requests.session()

payload = {"i0023":"XXXXX", 
          "i0025":"XXXXX"
         }
         
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)

s = session.get("http://192.168.XXX.XXX/m_departmentid.html")

soup = BeautifulSoup(s.text, "html.parser")

table = soup.find('div', attrs={ "class" : "ItemListComponent"})
table_tbody = table.find('tbody')

rows = []
 
for row in table.find_all('tr'):
    rows.append([val.text for val in row.find_all('td')])   


with open(("\test.csv"), 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(row for row in rows if row)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM