I have a HTML table stored in a file. I want to take each td value from the table which has the attribute like so :
<td describedby="grid_1-1" ... >Value for CSV</td>
<td describedby="grid_1-1" ... >Value for CSV2</td>
<td describedby="grid_1-1" ... >Value for CSV3</td>
<td describedby="grid_1-2" ... >Value for CSV4</td>
and I want to put it into a CSV file, with each new value taking up a new line in the CSV.
So for the file above, the CSV produced would be :
Value for CSV
Value for CSV2
Value for CSV3
Value for CSV4 would be ignored as describedby="grid_1-2", not "grid_1-1".
So I have tried this, however no matter what I try there seems to be (a) a blank line in between each printed line (b) a comma separating each char.
So the print is more like :
V,a,l,u,e,f,o,r,C,S,V,
V,a,l,u,e,f,o,r,C,S,V,2
What silly thing have I done now?
Thanks :)
import csv
import os
from bs4 import BeautifulSoup
with open("C:\\Users\\ADMIN\\Desktop\\test.html", 'r') as orig_f:
soup = BeautifulSoup(orig_f.read())
results = soup.findAll("td", {"describedby":"grid_1-1"})
with open('C:\\Users\\ADMIN\\Desktop\\Deploy.csv', 'wb') as fp:
a = csv.writer(fp, delimiter=',')
for result in results :
a.writerows(result)
If result is a string inside a list you need to wrap it in a list as writerows expects an iterable of iterables and iterates over the string:
a.writerows([result]) <- wrap in a list
In your case you should use writerow and extract the text from each td tag in results:
a.writerow([result.text]) # write the text from td element
You have all the td tags in your result list so you just need extract the text with .text.
use lxml
and csv
module.
td
text value which attribute describedby
have value grid_1-1
by xpath()
method of lxml. csv
file in write mode. writerow()
method code:
content = """
<body>
<td describedby="grid_1-1">Value for CSV</td>
<td describedby="grid_1-1">Value for CSV2</td>
<td describedby="grid_1-1">Value for CSV3</td>
<td describedby="grid_1-2">Value for CSV4</td>
</body>
"""
from lxml import etree
import csv
root = etree.fromstring(content)
l = root.xpath("//td[@describedby='grid_1-1']/text()")
with open('/home/vivek/Desktop/output.csv', 'wb') as fp:
a = csv.writer(fp, delimiter=',')
for i in l :
a.writerow([i, ])
output:
Value for CSV
Value for CSV2
Value for CSV3
Value for CSV4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.