I have a list that looks like below, and I need to convert this to multple rows either in an excel or a csv format
<tr>
<th>Name</th>
<th>Address1</th>
<th>City</th>
<th>State</th>
<th>Zip</th>
</tr>
<tr>
<th>John</th>
<th>111 Michigan</th>
<th>Chicago </th>
<th>IL</th>
<th>60661</th>
</tr>
Desired result:
Name Address1 City State Zip
John 111 Michigan Chicago IL 60661
使用Beautiful Soup解析HTML,并为每一行打印列值。
I have tried using beautifulSoup4, but I am able to only get the first row as my result. The rest if the rows are coming as blank
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("CofATX.txt"))
table = soup.find('table')
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('th')
for th in cols:
text = ''.join(th.text.strip())
print text + "|",
print
The result I am getting is Name | Address1 | City | State | Zip The rest if the rows are blank
I might use the pandas library for this. You can turn the table into a DataFrame
(kind of like an Excel sheet), although we'll have to add <table>
markings because they're missing from your text:
import pandas as pd
with open("name.html") as fp:
text = fp.read()
df = pd.read_html("<table>" + text + "</table>", infer_types=False)[0]
which gives us
>>> df
0 1 2 3 4
0 Name Address1 City State Zip
1 John 111 Michigan Chicago IL 60661
which we can save as a csv
file:
>>> df.to_csv("out.csv", sep="|", index=False, header=False)
giving
Name|Address1|City|State|Zip
John|111 Michigan|Chicago|IL|60661
or save directly as an Excel file:
>>> df.to_excel("out.xlsx")
pandas
is my go-to tool for data munging.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.