简体   繁体   中英

Python csv/excel convert single column to multiple rows

I have a list that looks like below, and I need to convert this to multple rows either in an excel or a csv format

<tr>
<th>Name</th>
<th>Address1</th>
<th>City</th>
<th>State</th>
<th>Zip</th>
</tr>

<tr>
<th>John</th>
<th>111 Michigan</th>
<th>Chicago </th>
<th>IL</th>
<th>60661</th>
</tr>

Desired result:

Name   Address1       City   State  Zip
John  111 Michigan   Chicago  IL    60661

使用Beautiful Soup解析HTML,并为每一行打印列值。

I have tried using beautifulSoup4, but I am able to only get the first row as my result. The rest if the rows are coming as blank

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("CofATX.txt"))
table = soup.find('table')

rows = table.findAll('tr')

for tr in rows:
    cols = tr.findAll('th')
for th in cols:
    text = ''.join(th.text.strip())
    print text + "|",
print

The result I am getting is Name | Address1 | City | State | Zip The rest if the rows are blank

I might use the pandas library for this. You can turn the table into a DataFrame (kind of like an Excel sheet), although we'll have to add <table> markings because they're missing from your text:

import pandas as pd
with open("name.html") as fp:
    text = fp.read()

df = pd.read_html("<table>" + text + "</table>", infer_types=False)[0]

which gives us

>>> df
      0             1        2      3      4
0  Name      Address1     City  State    Zip
1  John  111 Michigan  Chicago     IL  60661

which we can save as a csv file:

>>> df.to_csv("out.csv", sep="|", index=False, header=False)

giving

Name|Address1|City|State|Zip
John|111 Michigan|Chicago|IL|60661

or save directly as an Excel file:

>>> df.to_excel("out.xlsx")

pandas is my go-to tool for data munging.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM