简体   繁体   English

按列将字典正确写入csv python文件

[英]correctly write a dictionary into a csv python file by columns

How can I correctly write a dictionary to a CSV file? 如何正确将字典写入CSV文件? I have written parsed data into a dictionary, I want to write the data by key -value in a separate column for every key in dict, and one of the key value pairs (exactly key 'ff ') 我已将解析后的数据写入字典,我想按键值将数据写入dict中每个键以及其中一个键值对(恰好是键'ff')的单独列中

I want to group and separate by 5 columns. 我想按5列分组和分开。 For example: 例如:

0,4,9,14... - in the first column
1,5,10,15 /-second ...etc. 

The problem is that the data must be saved in the utf-8 encoding, so that the Russian characters in the file will show correctly. 问题在于数据必须以utf-8编码保存,以便文件中的俄语字符能够正确显示。

Here is an example of my code. 这是我的代码示例。 now everything is written into a single column, I want to generate a kind of the price list in CSV. 现在,所有内容都写到一个列中,我想以CSV格式生成一种价格表。

I am using Python 2.7 我正在使用Python 2.7

import requests
from bs4 import BeautifulSoup
import csv
import re
def get_html(url):
    r = requests.get(url)
    return r.text
url='http://www.autobody.ru/kuzovnoy-remont/'
html=get_html(url)
soup=BeautifulSoup(html, 'html.parser')


mydivs = soup.findAll('a',class_="banners_images")

urls=[]
for i in mydivs:
     ur=(i.get('href'))
     ur='http://www.autobody.ru'+str(ur)
     urls.append(ur)
#head =[]
#headers = soup.findAll('h1')
#head.append(headers[0].text.strip())
images=[]
heads =[]
artic=[]
atrib=[]
price=[]
for i in urls:
 html=get_html(i)
 soup=BeautifulSoup(html, 'html.parser')
 head = soup.find('h1').get_text()
 heads.append(head )

 image=[x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
 image1='http://www.autobody.ru'+image[0]
 images.append(image1)

 price1 = soup.find('div', class_='price').get_text()
 price1=re.sub(r"c",r"p", price1)
 price.append(price1)
 for tr in soup.find('table', class_='tech').find_all('tr'):
    artic.append(tr.get_text())
 da={'titles': heads,'texts':price,'ff':artic,'images':images}

 with open('c:\\1\\121.csv','a') as f:
  f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
  writer=csv.writer(f)
  for i in da:
   for rows in da[i]:
    writer.writerow([rows.encode('utf8')])

You need to use DictWriter: 您需要使用DictWriter:

  1. Create keys for the columns names: 为列名称创建键:

     keys = mydict.keys() 

    or just manually: 或只是手动:

     keys = ["column1", "columns2"] 
  2. Write data to CSV: 将数据写入CSV:

     with open(file_name, 'a', encoding="utf-8") as output_file: dict_writer = csv.DictWriter(output_file, keys, delimiter=',', lineterminator='\\n') dict_writer.writeheader() dict_writer.writerows([mydict]) 

You have created a normal CSV writer, but are trying to convert your data into a dictionary and write that. 您已经创建了一个普通的CSV编写器,但是正在尝试将您的数据转换成字典并编写它。 You could make use of a dictionary writer, but I feel it would make more sense to avoid trying to use a dictionary for this and to just convert your data into correctly formatted lists. 您可以利用字典编写器,但是我觉得避免尝试为此使用字典并将数据转换为正确格式的列表更有意义。

Currently you are building all the data in columns, but will need to write this in row form. 当前,您正在按列构建所有数据,但是需要以行形式编写。 Row/Col swapping can be done using zip(*[col1, col2, col3]) . 行/彩色交换可以使用zip(*[col1, col2, col3]) Also it would make sense to encode your data as you go along: 同样,在进行数据编码时也很有意义:

import requests
from bs4 import BeautifulSoup
import csv
import re

def get_html(url):
    r = requests.get(url)
    return r.text

url = 'http://www.autobody.ru/kuzovnoy-remont/'
html = get_html(url)
soup = BeautifulSoup(html, 'html.parser')
mydivs = soup.findAll('a',class_="banners_images")
urls = []

for i in mydivs:
    ur = (i.get('href'))
    ur = 'http://www.autobody.ru' + str(ur)
    urls.append(ur)

images = []
heads = []
artic = []
atrib = []
price = []

with open('121.csv', 'wb') as f:        # Open the file in binary mode for Python 2.x
    f.write(u'\ufeff'.encode('utf8')) # writes "byte order mark" UTF-8 signature
    writer = csv.writer(f)

    for i in urls:
        html = get_html(i)
        soup = BeautifulSoup(html, 'html.parser')
        head = soup.find('h1').get_text()
        heads.append(head.encode('utf8'))

        image = [x['src'] for x in soup.findAll('img', {'class': 'detimg'})]
        image1 = 'http://www.autobody.ru'+image[0]
        images.append(image1.encode('utf8'))

        price1 = soup.find('div', class_='price').get_text()
        price1 = re.sub(r"c",r"p", price1)
        price.append(price1.encode('utf8'))

        for tr in soup.find('table', class_='tech').find_all('tr'):
            artic.append(tr.get_text().strip().encode('utf8'))

        writer.writerows(zip(*[heads, price, artic, images]))

This would give you an output file starting: 这将为您提供一个输出文件开始:

CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
CIVIC РУЧКА ПЕРЕД ДВЕРИ ЛЕВ ВНЕШН ЧЕРН,295 p,"Артикул
HDCVC96-500B-L",http://www.autobody.ru/upload/images/HDCVC96-500B-L.jpg.pagespeed.ce.JnqIICpcSq.jpg
AUDI A4 БАМПЕР ПЕРЕДН ГРУНТ,3882 p,"ОЕМ#
72180S04003",http://www.autobody.ru/upload/images/AI0A401-160X.jpg.pagespeed.ce.onSZWY1J15.jpg

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM