简体   繁体   中英

Write to csv python Horizontally append Each time

I Wrote this Piece of code which scrapes Amazon for some elements using page URL, Now i want to add a csv function which enables me to append horizontally CSV columns With Following varibles :- ( Date_time, price, Merchant, Sellers_count ) Each time i run the code this columns should be added on right without removing any existing columns ..Here is code & table format to whom i want to add

# -*- coding: cp1252 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
import requests, csv, time, urllib2, gspread, os, ast, datetime
from scrapy import Selector as s
from lxml import html
from random import randint
from oauth2client.client import SignedJwtAssertionCredentials

x = lambda x: source.xpath(x).extract()

links = ['http://www.amazon.com/dp/B00064NZCK',
         'http://www.amazon.com/dp/B000CIU7F8',
         'http://www.amazon.com/dp/B000H5839I',
         'http://www.amazon.com/dp/B000LTLBHG',
         'http://www.amazon.com/dp/B000SDLXKU',
         'http://www.amazon.com/dp/B000SDLXNC',
         'http://www.amazon.com/dp/B000SPHPWI',
         'http://www.amazon.com/dp/B000UUMHRE']

driver = webdriver.Firefox()
#driver.set_page_load_timeout(30)

for Url in links:
    try:
        driver.get(Url)
    except:
        pass
    time.sleep(randint(1,3))
    try:
        html = driver.page_source
        source = s(text=html,type="html")
    except:
        pass
    try:
        Page_link = x('//link[@rel="canonical"]//@href')
    except:
        pass
    try:
        Product_Name = x('//span[@id="productTitle"]/text()')
    except:
        pass
    Product_Name = str(Product_Name).encode('utf-8'); Product_Name = Product_Name.replace("[u'","").replace("']","")
    try:
        price = x('//span[@id="priceblock_ourprice"]//text()')
    except:
        pass
    try:
        Merchant = x('//div[@id="merchant-info"]//a//text()')
    except:
        pass
    try:
        Sellers_count = x('//span[@class="olp-padding-right"]//a/text()')
    except:
        pass
    if Merchant == []:
        Merchant = 'Amazon'
    else:
        Merchant = Merchant[0]
    price = str(price).replace("[u'","").replace("']","")
    if len(Sellers_count)>0:
        Sellers_count = Sellers_count[0].encode('utf-8')
    else:
        Sellers_count = str(Sellers_count).encode('utf-8')
    try:
        Sellers_count = Sellers_count.replace(" new",""); Sellers_count = int(Sellers_count)-1
    except:
        pass
    if Sellers_count == []:
        Sellers_count = str(Sellers_count).replace("[]","")
    else:
        Sellers_count = Sellers_count
    Date_time = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
    print Date_time, Product_Name, Url, price, Merchant, Sellers_count

The Existing table format I want to append to :-

ASIN    ID  PRODUCT URL
B00064NZCK  MG-5690 BigMouth Inc Over The Hill Parking Privelege Permit http://www.amazon.com/dp/B00064NZCK
B000CIU7F8  BM1102  BigMouth Inc Pocket Disgusting Sounds Machine   http://www.amazon.com/dp/B000CIU7F8
B000H5839I  MG-4774 BigMouth Inc All Occasion Over The Hill Cane    http://www.amazon.com/dp/B000H5839I
B000LTLBHG  BM1234  BigMouth Inc Beer Belt / 6 Pack Holster(Black)  http://www.amazon.com/dp/B000LTLBHG
B000SDLXKU  BM1103  BigMouth Inc Covert Clicker http://www.amazon.com/dp/B000SDLXKU
B000SDLXNC  BM1254  BigMouth Inc Inflatable John    http://www.amazon.com/dp/B000SDLXNC
B000SPHPWI  SO:AP   Design Sense Generic Weener Kleener Soap    http://www.amazon.com/dp/B000SPHPWI
B000UUMHRE  MG-5305 BigMouth Inc Over the Hill Rectal Thermometer   http://www.amazon.com/dp/B000UUMHRE

You have to read the CSV you already have and write a new file contains the columns you add, here an example:

with open('your.csv', 'w') as out_file:
    with open('new.csv', 'r') as in_file:
        for line in in_file:
            out_file.write(line.rstrip('\n') + Date_time+ Product_name + '\n')

Obviusly, you have to manage the header (first line I suppose)

Hope I helped you

The following should do what you need. It reads in your existing CSV file and adds the four new column headings. For each URL your code then obtains the new data. This is then added to the end of the existing rows (order does not matter). Afterwards, an updated CSV file is created:

import csv

links = ['http://www.amazon.com/dp/B00064NZCK',
         'http://www.amazon.com/dp/B000CIU7F8',
         'http://www.amazon.com/dp/B000H5839I',
         'http://www.amazon.com/dp/B000LTLBHG',
         'http://www.amazon.com/dp/B000SDLXKU',
         'http://www.amazon.com/dp/B000SDLXNC',
         'http://www.amazon.com/dp/B000SPHPWI',
         'http://www.amazon.com/dp/B000UUMHRE']

with open('existing.csv', 'r') as f_input:
    csv_input = csv.reader(f_input)

    # Read in the existing CSV file 
    headers = next(csv_input) + ["Date_time", "price", "Merchant", "Sellers_count"]
    rows = list(csv_input)

    # Create an index just in case the order changes or there are other entries
    url_indexes = {row[3] : index for index, row in enumerate(rows)}

    for url in links:
        # Insert your existing code here to get the actual data

        Date_time = "2015-08-27_12-34-56"
        price = "123.45"
        Merchant = "Def"
        Sellers_count = "42"

        rows[url_indexes[url]].extend([Date_time, price, Merchant, Sellers_count])

# Write the updated CSV to a new file

with open('updated.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(headers)
    csv_output.writerows(rows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM