简体   繁体   中英

correctly write a dictionary into a csv python file by columns

How can I correctly write a dictionary to a CSV file? I have written parsed data into a dictionary, I want to write the data by key -value in a separate column for every key in dict, and one of the key value pairs (exactly key 'ff ')

I want to group and separate by 5 columns. For example:

0,4,9,14... - in the first column
1,5,10,15 /-second ...etc. 

The problem is that the data must be saved in the utf-8 encoding, so that the Russian characters in the file will show correctly.

Here is an example of my code. now everything is written into a single column, I want to generate a kind of the price list in CSV.

I am using Python 2.7

import requests
from bs4 import BeautifulSoup
import csv
import re
def get_html(url):
    r = requests.get(url)
    return r.text
url='http://www.autobody.ru/kuzovnoy-remont/'
html=get_html(url)
soup=BeautifulSoup(html, 'html.parser')


mydivs = soup.findAll('a',class_="banners_images")

urls=[]
for i in mydivs:
     ur=(i.get('href'))
     ur='http://www.autobody.ru'+str(ur)
     urls.append(ur)
#head =[]
#headers = soup.findAll('h1')
#head.append(headers[0].text.strip())
images=[]
heads =[]
artic=[]
atrib=[]
price=[]
for i in urls:
 html=get_html(i)
 soup=BeautifulSoup(html, 'html.parser')
 head = soup.find('h1').get_text()
 heads.append(head )

 image=[x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
 image1='http://www.autobody.ru'+image[0]
 images.append(image1)

 price1 = soup.find('div', class_='price').get_text()
 price1=re.sub(r"c",r"p", price1)
 price.append(price1)
 for tr in soup.find('table', class_='tech').find_all('tr'):
    artic.append(tr.get_text())
 da={'titles': heads,'texts':price,'ff':artic,'images':images}

 with open('c:\\1\\121.csv','a') as f:
  f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
  writer=csv.writer(f)
  for i in da:
   for rows in da[i]:
    writer.writerow([rows.encode('utf8')])

You need to use DictWriter:

  1. Create keys for the columns names:

     keys = mydict.keys() 

    or just manually:

     keys = ["column1", "columns2"] 
  2. Write data to CSV:

     with open(file_name, 'a', encoding="utf-8") as output_file: dict_writer = csv.DictWriter(output_file, keys, delimiter=',', lineterminator='\\n') dict_writer.writeheader() dict_writer.writerows([mydict]) 

You have created a normal CSV writer, but are trying to convert your data into a dictionary and write that. You could make use of a dictionary writer, but I feel it would make more sense to avoid trying to use a dictionary for this and to just convert your data into correctly formatted lists.

Currently you are building all the data in columns, but will need to write this in row form. Row/Col swapping can be done using zip(*[col1, col2, col3]) . Also it would make sense to encode your data as you go along:

import requests
from bs4 import BeautifulSoup
import csv
import re

def get_html(url):
    r = requests.get(url)
    return r.text

url = 'http://www.autobody.ru/kuzovnoy-remont/'
html = get_html(url)
soup = BeautifulSoup(html, 'html.parser')
mydivs = soup.findAll('a',class_="banners_images")
urls = []

for i in mydivs:
    ur = (i.get('href'))
    ur = 'http://www.autobody.ru' + str(ur)
    urls.append(ur)

images = []
heads = []
artic = []
atrib = []
price = []

with open('121.csv', 'wb') as f:        # Open the file in binary mode for Python 2.x
    f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
    writer = csv.writer(f)

    for i in urls:
        html = get_html(i)
        soup = BeautifulSoup(html, 'html.parser')
        head = soup.find('h1').get_text()
        heads.append(head.encode('utf8'))

        image = [x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
        image1 = 'http://www.autobody.ru'+image[0]
        images.append(image1.encode('utf8'))

        price1 = soup.find('div', class_='price').get_text()
        price1 = re.sub(r"c",r"p", price1)
        price.append(price1.encode('utf8'))

        for tr in soup.find('table', class_='tech').find_all('tr'):
            artic.append(tr.get_text().strip().encode('utf8'))

        writer.writerows(zip(*[heads, price, artic, images]))

This would give you an output file starting:

CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
AUDI A4 БАМПЕР ПЕРЕДН ГРУНТ,3882 p,"ОЕМ#
72180S04003",http://www.autobody.ru/upload/images/AI0A401-160X.jpg.pagespeed.ce.onSZWY1J15.jpg

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM