[英]Convert web scraped string list to formatted CSV
我正在嘗試構建一個快速的小腳本,以從網站上從網站抓取數據並將結果保存到格式化的CSV中。
到目前為止,使用BeautifulSoup並已經能夠從網站上獲取我想要的數據,對其進行編碼,以便可以將其保存為CSV,但是其格式很長,沒有邏輯格式(我可以看到),我正在嘗試弄清楚如何轉換。
代碼:#導入庫從bs4導入urllib2導入BeautifulSoup
import csv
from datetime import datetime
# specify the url
quote_page = 'url'
# query the website and return the html to the variable 'page'
page = urllib2.urlopen(quote_page)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find('ul', attrs={'id': 'list-store-detail'})
name = name_box.text.strip() # strip() is used to remove starting and trailing
print name
# open a csv file with append, so old data will not be erased
with open('index.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow([name.encode('utf-8')])
電流輸出:
Name
Address 1
Address 2
Country
Name + Address
Phone Number
Street View
Direction
Name
Address 1
Address 2
Country
Name + Address
Phone Number
Street View
Direction
如您所見,它們之間有巨大的空白,據我所知,它實際上沒有任何\\ n \\ r。
我假設我將不得不以某種方式將字符串分成幾行,進行循環,然后將其正確格式化為CSV?
任何幫助,將不勝感激。
您的假設是正確的! 可能有一種更有效的方法來執行此操作,但這需要很少的代碼更改。
使用分割字符串
split_name = name.split("\n")
擺脫空白行
no_blanks = [ x for x in split_name if len(x) > 0 ]
寫入CSV
with open('index.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
line = []
for i in range(len(no_blanks)):
line.append(no_blanks[I].strip())
if len(line) == 8:
writer.writerow(line)
line = []
產量
Name,Address 1,Address 2,Country,Name + Address,Phone Number,Street View,Direction Name,Address 1,Address 2,Country,Name + Address,Phone Number,Street View,Direction
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.