简体   繁体   English

使用 python 抓取网站时 CSV 列中的 Output 元素

[英]Output elements in CSV columns when scraping a website with python

I need to scrape a book web site and save the information (price, code, fees, etc.) in a CSV file as a table, but when I try to save the data in the CSV file, I have the title name repeated several times and the information is vertical, I need to place it horizontally and at the end of the information in a book, I need the next information to be on the bottom line.我需要刮一本书 web 站点并将信息(价格,代码,费用等)保存在 CSV 文件中作为表格,但是当我尝试将数据保存在 ZCC8D68C551C4A9A6D5313E074DE 文件中时,标题名称重复了几个IDE,时间和信息是垂直的,我需要将它水平放置并且在一本书中信息的末尾,我需要下一个信息在底线上。

with open('description.csv', 'w') as outf:
outf.write('title, universal_ product_code (upc)  ,Product Type  ,price_excluding_tax  ,price_including_tax ,tax  ,number_available         ,review_rating,image_url\n')
with open('url_book.txt', 'r') as file:
    for row in file:
        url = row.strip()
        reponse = requests.get(url)
        if reponse.ok:
            soup = BeautifulSoup(reponse.text , 'html.parser')
            title = soup.find('div', {'class': "col-sm-6 product_main"}).find('h1').text
            titles.append(title + ',')
            print(title)
            tables = soup.find('table')
            #print(tables)    
            trs = tables.findAll('tr')
            for ths in trs:
                #th = ths.find('th').text
                td = ths.find('td').text
                info_desc = (td + ',')
                product_description.append(info_desc)
                print(info_desc)
                outf.write(title + ',' + info_desc + '\n')

Current output:当前 output:

电流输出

Expected output:预期 output:

预期产出

Inputs:输入:

 http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html http://books.toscrape.com/catalogue/full-moon-over-noahs-ark-an-odyssey-to-mount-ararat-and-beyond_811/index.html http://books.toscrape.com/catalogue/see-america-a-celebration-of-our-national-parks-treasured-sites_732/index.html http://books.toscrape.com/catalogue/vagabonding-an-uncommon-guide-to-the-art-of-long-term-world-travel_552/index.html http://books.toscrape.com/catalogue/under-the-tuscan-sun_504/index.html http://books.toscrape.com/catalogue/a-summer-in-europe_458/index.html http://books.toscrape.com/catalogue/the-great-railway-bazaar_446/index.html http://books.toscrape.com/catalogue/a-year-in-provence-provence-1_421/index.html http://books.toscrape.com/catalogue/the-road-to-little-dribbling-adventures-of-an-american-in-britain-notes-from-a-small-island-2_277/index.html http://books.toscrape.com/catalogue/neither-here-nor-there-travels-in-europe_198/index.html http://books.toscrape.com/catalogue/1000-places-to-see-before-you-die_1/index.html http://books.toscrape.com/catalogue/sharp-objects_997/index.html http://books.toscrape.com/catalogue/in-a-dark-dark-wood_963/index.html http://books.toscrape.com/catalogue/the-past-never-ends_942/index.html http://books.toscrape.com/catalogue/a-murder-in-time_877/index.html http://books.toscrape.com/catalogue/the-murder-of-roger-ackroyd-hercule-poirot-4_852/index.html http://books.toscrape.com/catalogue/the-last-mile-amos-decker-2_754/index.html http://books.toscrape.com/catalogue/that-darkness-gardiner-and-renner-1_743/index.html http://books.toscrape.com/catalogue/tastes-like-fear-di-marnie-rome-3_742/index.html http://books.toscrape.com/catalogue/a-time-of-torment-charlie-parker-14_657/index.html http://books.toscrape.com/catalogue/a-study-in-scarlet-sherlock-holmes-1_656/index.html http://books.toscrape.com/catalogue/poisonous-max-revere-novels-3_627/index.html http://books.toscrape.com/catalogue/murder-at-the-42nd-street-library-raymond-ambler-1_624/index.html http://books.toscrape.com/catalogue/most-wanted_623/index.html http://books.toscrape.com/catalogue/hide-away-eve-duncan-20_620/index.html http://books.toscrape.com/catalogue/boar-island-anna-pigeon-19_613/index.html http://books.toscrape.com/catalogue/the-widow_609/index.html http://books.toscrape.com/catalogue/playing-with-fire_602/index.html http://books.toscrape.com/catalogue/what-happened-on-beale-street-secrets-of-the-south-mysteries-2_506/index.html http://books.toscrape.com/catalogue/the-bachelor-girls-guide-to-murder-herringford-and-watts-mysteries-1_491/index.html http://books.toscrape.com/catalogue/delivering-the-truth-quaker-midwife-mystery-1_464/index.html http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html http://books.toscrape.com/catalogue/forever-and-forever-the-courtship-of-henry-longfellow-and-fanny-appleton_894/index.html http://books.toscrape.com/catalogue/a-flight-of-arrows-the-pathfinders-2_876/index.html http://books.toscrape.com/catalogue/the-house-by-the-lake_846/index.html http://books.toscrape.com/catalogue/mrs-houdini_821/index.html http://books.toscrape.com/catalogue/the-marriage-of-opposites_759/index.html http://books.toscrape.com/catalogue/glory-over-everything-beyond-the-kitchen-house_696/index.html http://books.toscrape.com/catalogue/love-lies-and-spies_622/index.html http://books.toscrape.com/catalogue/a-paris-apartment_612/index.html http://books.toscrape.com/catalogue/lilac-girls_597/index.html http://books.toscrape.com/catalogue/the-constant-princess-the-tudor-court-1_493/index.html http://books.toscrape.com/catalogue/the-invention-of-wings_448/index.html http://books.toscrape.com/catalogue/world-without-end-the-pillars-of-the-earth-2_420/index.html http://books.toscrape.com/catalogue/the-passion-of-dolssa_351/index.html http://books.toscrape.com/catalogue/girl-with-a-pearl-earring_322/index.html http://books.toscrape.com/catalogue/voyager-outlander-3_299/index.html http://books.toscrape.com/catalogue/the-red-tent_273/index.html http://books.toscrape.com/catalogue/the-last-painting-of-sara-de-vos_259/index.html http://books.toscrape.com/catalogue/the-guernsey-literary-and-potato-peel-pie-society_253/index.html http://books.toscrape.com/catalogue/girl-in-the-blue-coat_160/index.html http://books.toscrape.com/catalogue/scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html http://books.toscrape.com/catalogue/tsubasa-world-chronicle-2-tsubasa-world-chronicle-2_949/index.html http://books.toscrape.com/catalogue/this-one-summer_947/index.html http://books.toscrape.com/catalogue/the-nameless-city-the-nameless-city-1_940/index.html http://books.toscrape.com/catalogue/saga-volume-5-saga-collected-editions-5_923/index.html http://books.toscrape.com/catalogue/rat-queens-vol-3-demons-rat-queens-collected-editions-11-15_921/index.html http://books.toscrape.com/catalogue/princess-jellyfish-2-in-1-omnibus-vol-01-princess-jellyfish-2-in-1-omnibus-1_920/index.html http://books.toscrape.com/catalogue/pop-gun-war-volume-1-gift_918/index.html http://books.toscrape.com/catalogue/patience_916/index.html http://books.toscrape.com/catalogue/outcast-vol-1-a-darkness-surrounds-him-outcast-1_915/index.html http://books.toscrape.com/catalogue/orange-the-complete-collection-1-orange-the-complete-collection-1_914/index.html http://books.toscrape.com/catalogue/lumberjanes-vol-2-friendship-to-the-max-lumberjanes-5-8_907/index.html http://books.toscrape.com/catalogue/lumberjanes-vol-1-beware-the-kitten-holy-lumberjanes-1-4_906/index.html http://books.toscrape.com/catalogue/lumberjanes-vol-3-a-terrible-plan-lumberjanes-9-12_905/index.html http://books.toscrape.com/catalogue/i-hate-fairyland-vol-1-madly-ever-after-i-hate-fairyland-compilations-1-5_899/index.html http://books.toscrape.com/catalogue/i-am-a-hero-omnibus-volume-1_898/index.html http://books.toscrape.com/catalogue/giant-days-vol-2-giant-days-5-8_895/index.html http://books.toscrape.com/catalogue/danganronpa-volume-1_889/index.html http://books.toscrape.com/catalogue/codename-baboushka-volume-1-the-conclave-of-death_887/index.html http://books.toscrape.com/catalogue/camp-midnight_886/index.html http://books.toscrape.com/catalogue/the-secret-garden_413/index.html http://books.toscrape.com/catalogue/the-metamorphosis_409/index.html http://books.toscrape.com/catalogue/the-pilgrims-progress_353/index.html http://books.toscrape.com/catalogue/the-hound-of-the-baskervilles-sherlock-holmes-5_348/index.html http://books.toscrape.com/catalogue/little-women-little-women-1_331/index.html http://books.toscrape.com/catalogue/gone-with-the-wind_324/index.html http://books.toscrape.com/catalogue/candide_316/index.html http://books.toscrape.com/catalogue/animal-farm_313/index.html http://books.toscrape.com/catalogue/wuthering-heights_307/index.html http://books.toscrape.com/catalogue/the-picture-of-dorian-gray_270/index.html http://books.toscrape.com/catalogue/the-complete-stories-and-poems-the-works-of-edgar-allan-poe-cameo-edition_238/index.html http://books.toscrape.com/catalogue/beowulf_126/index.html http://books.toscrape.com/catalogue/and-then-there-were-none_119/index.html http://books.toscrape.com/catalogue/the-story-of-hong-gildong_84/index.html http://books.toscrape.com/catalogue/the-little-prince_72/index.html http://books.toscrape.com/catalogue/sense-and-sensibility_49/index.html

Python's CSV module might help you.Python 的 CSV 模块可能会对您有所帮助。 Using the CSV module makes it easy.使用 CSV 模块可以轻松实现。 The only thing you need to do is to append the items to a list and then output them all at once, see my_list in the code below.您唯一需要做的就是将 append 项目添加到列表中,然后将 output 一次全部添加,请参阅下面代码中的my_list

import csv
from bs4 import BeautifulSoup
import requests

# see comment below about 'wb'
with open('description.csv', 'wb') as f:
    writer = csv.writer(f)
    # used pseudo headers
    writer.writerow(["a", "b", "c"])

    with open('url_book.txt', 'r') as file:
        for row in file:
            url = row.strip()
            reponse = requests.get(url)
            if reponse.ok:
                soup = BeautifulSoup(reponse.text , 'html.parser')
                title = soup.find('div', {'class': "col-sm-6 product_main"}).find('h1').text.encode('utf-8')

                my_list = [title]
                tables = soup.find('table')
                trs = tables.findAll('tr')

                for ths in trs:
                    td = ths.find('td').text.encode('utf-8')
                    my_list.append(td)

            writer.writerow(my_list)

You might ask why I used 'wb' in open - I used it to avoid new lines on windows, see this SO post - just use your code if you are on Mac.您可能会问为什么我在 open 中使用 'wb' - 我用它来避免 windows 上的新行,请参阅此 SO 帖子- 如果您在 Mac 上,只需使用您的代码。

Output: Output:

a,b,c
It's Only the Himalayas,a22124811bfa8350,Books,£45.17,£45.17,£0.00,In stock (19 available),0
Full Moon over Noahâs Ark: An Odyssey to Mount Ararat and Beyond,ce60436f52c5ee68,Books,£49.43,£49.43,£0.00,In stock (15 available),0
See America: A Celebration of Our National Parks & Treasured Sites,f9705c362f070608,Books,£48.87,£48.87,£0.00,In stock (14 available),0
Vagabonding: An Uncommon Guide to the Art of Long-Term World Travel,1809259a5a5f1d8d,Books,£36.94,£36.94,£0.00,In stock (8 available),0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM