將網頁抓取的字符串列表轉換為格式化的CSV

Question

我正在嘗試構建一個快速的小腳本，以從網站上從網站抓取數據並將結果保存到格式化的CSV中。

到目前為止，使用BeautifulSoup並已經能夠從網站上獲取我想要的數據，對其進行編碼，以便可以將其保存為CSV，但是其格式很長，沒有邏輯格式（我可以看到），我正在嘗試弄清楚如何轉換。

代碼：＃導入庫從bs4導入urllib2導入BeautifulSoup

import csv
from datetime import datetime

# specify the url
quote_page = 'url'

# query the website and return the html to the variable 'page'
page = urllib2.urlopen(quote_page)

# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')

# Take out the <div> of name and get its value
name_box = soup.find('ul', attrs={'id': 'list-store-detail'})

name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

# open a csv file with append, so old data will not be erased
with open('index.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([name.encode('utf-8')])

電流輸出：

Name 
Address 1
Address 2
Country
Name + Address
Phone Number
Street View
Direction







Name 
Address 1
Address 2
Country
Name + Address
Phone Number
Street View
Direction

所需輸出：

如您所見，它們之間有巨大的空白，據我所知，它實際上沒有任何\\ n \\ r。

我假設我將不得不以某種方式將字符串分成幾行，進行循環，然后將其正確格式化為CSV？

任何幫助，將不勝感激。

Answer 1

您的假設是正確的！ 可能有一種更有效的方法來執行此操作，但這需要很少的代碼更改。

使用分割字符串

split_name = name.split("\n")

擺脫空白行

no_blanks = [ x for x in split_name if len(x) > 0 ]

寫入CSV

with open('index.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
line = []
for i in range(len(no_blanks)):
    line.append(no_blanks[I].strip())
    if len(line) == 8:
        writer.writerow(line)
        line = []

產量

Name,Address 1,Address 2,Country,Name + Address,Phone Number,Street View,Direction Name,Address 1,Address 2,Country,Name + Address,Phone Number,Street View,Direction

將網頁抓取的字符串列表轉換為格式化的CSV

問題描述

1 個解決方案

解決方案1
0 已采納 2018-12-11 03:40:33

將網頁抓取的字符串列表轉換為格式化的CSV

問題描述

1 個解決方案

解決方案1 0 已采納 2018-12-11 03:40:33

解決方案1
0 已采納 2018-12-11 03:40:33