簡體   English   中英

美湯表轉CSV

[英]Beautiful Soup Table to CSV

我正在使用漂亮的湯來嘗試抓取網站表並僅將特定列提取到 CSV 文件中。

import requests
import urllib.request
from bs4 import BeautifulSoup

product_table = browser.page_source

soup = BeautifulSoup(product_table, 'html.parser')

table = soup.find_all('table')[4]

table_rows = table.find_all('tr')
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    print(row)

print(row)輸出:

[]
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Sour Bears 12 Flavor', '12', '7 oz', '17.14', 'CS', '53328', '', 'ACG53328', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Bears 12 Flavor', '12', '7.5 oz', '17.14', 'CS', '53348', '', 'ACG53348', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Mini Worms 12 Flavor', '12', '7.5 oz', '17.14', 'CS', '53350', '', 'ACG53350', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Bears 12 Flavor', '6', '9 oz', '11.59', 'CS', '53380', '', 'ACG53380', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Mini Worms 12 Flavor', '6', '9 oz', '11.59', 'CS', '53381', '', 'ACG53381', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Peach Rings', '6', '8 oz', '11.59', 'CS', '53383', '', 'ACG53383', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Worms Mini Sour Neon', '6', '8 oz', '11.59', 'CS', '53384', '', 'ACG53384', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Bears 12 Flavor', '12', '3.5 oz', '8.23', 'CS', '53450', '', 'ACG53450', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Sherbet Bears 12 Flavor', '12', '3.5 oz', '8.23', 'CS', '53456', '', 'ACG53456', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'AMERICAN LICORI', 'Red Vines', 'Red Vines Orig Red Twists Bag', '12', '8 oz', '19.20', 'CS', '00232', '', 'AML00232', '', '\xa0\xa0\xa0\xa0']

所以我的問題是:如何從每一行中僅提取單元格[11][7]並將它們並排打印到 csv。 例如,對於第 1 行,我想將 ACG53328(單元格 A)和 17.14(單元格 B)寫入 csv 文件並繼續向下。 如果有什么不同的話,我沒有在此處粘貼大約 4,000 條額外的行。

像下面這樣的東西應該可以工作:

import csv
import requests
import urllib.request
from bs4 import BeautifulSoup

product_table = browser.page_source

soup = BeautifulSoup(product_table,'html.parser')
table = soup.find_all('table')[4]
with open('output.csv', 'w', newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['SKU', 'TD_7'])
    for tr in table.find_all('tr'):
        try:
            td_12 = tr.find_all('td')[12].get_text(strip=True)
        except IndexError:
            td_12 = ""
        try:
            td_08 = tr.find_all('td')[8].get_text(strip=True)
        except IndexError:
            td_08 = ""
        writer.writerow([td_12, td_08])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM