简体   繁体   English

Python将数据格式化为CSV文件

[英]Python formatting data to csv file

I'll try to look for help once more, so my base code is ready, in the very beginning, it converts all the negative values to 0, and after that, it does calculate the sum and cumulative values of the csv data: 我将尝试再次寻求帮助,因此我的基本代码已经准备好,从一开始就将所有负值都转换为0,然后,它会计算csv数据的总和和累积值:

import csv
from collections import defaultdict, OrderedDict


def convert(data):
    try:
        return int(data)
    except ValueError:
        return 0


with open('MonthData1.csv', 'r') as file1:
        read_file = csv.reader(file1, delimiter=';')
        delheader = next(read_file)
        data = defaultdict(int)
        for line in read_file:
            valuedata = max(0, sum([convert(i) for i in line[1:5]]))
            data[line[0].split()[0]] += valuedata

        for key in OrderedDict(sorted(data.items())):
            print('{};{}'.format(key, data[key]))
        print("")
        previous_values = []
        for key, value in OrderedDict(sorted(data.items())).items():
            print('{};{}'.format(key, value + sum(previous_values)))
            previous_values.append(value)

This code prints: 此代码打印:

1.5.2018 245
2.5.2018 105
4.5.2018 87

1.5.2018 245
2.5.2018 350
4.5.2018 437

That's how I want it to print the data. 这就是我希望它打印数据的方式。 First the sum of each day, and then the cumulative value. 首先是每天的总和,然后是累计值。 My question is, how can I format this data so it can be written to a new csv file with the same format as it prints it? 我的问题是,如何格式化该数据,以便可以将其以与打印时相同的格式写入新的csv文件? So the new csv file should look like this: 因此,新的csv文件应如下所示:
在此处输入图片说明

I have tried to do it myself (with dateime), and searched for answers but I just can't find a way. 我已经尝试过自己(用大写字母做),并寻找答案,但是我找不到方法。 I hope to get a solution this time, I'd appreciate it massively. 我希望这次能得到解决方案,对此我将不胜感激。
The data file as csv: https://files.fm/u/2vjppmgv 数据文件为csv: https : //files.fm/u/2vjppmgv
Data file in pastebin https://pastebin.com/Tw4aYdPc Hope this can be done with default libraries pastebin中的数据文件https://pastebin.com/Tw4aYdPc希望可以使用默认库来完成

Writing a CSV is simply a matter of writing values separated by commas (or semi-colons in this case. A CSV is a plain text file (a .txt if you will). You can read it and write using python's open() function if you'd like to. 编写CSV只是简单地编写用逗号分隔的值(在这种情况下为分号)。CSV是纯文本文件(如果可以,则是.txt)。您可以使用python的open()函数进行读取和写入如果您愿意的话。

You could actually get rid of the CSV module if you wish. 如果愿意,您实际上可以摆脱CSV模块。 I included an example of this in the end. 最后,我提供了一个示例。

This version uses only the libraries that were available in your original code. 此版本仅使用原始代码中可用的库。

import csv
from collections import defaultdict, OrderedDict

def convert(data):
    try:
        return int(data)
    except ValueError:
        return 0    

file1 = open('Monthdata1.csv', 'r')
file2 = open('result.csv', 'w')

read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
    valuedata = max(0, sum([convert(i) for i in line[1:5]]))
    data[line[0].split()[0]] += valuedata

for key in OrderedDict(sorted(data.items())):
    file2.write('{};{}\n'.format(key, data[key]))
file2.write('\n')
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
    file2.write('{};{}\n'.format(key, value + sum(previous_values)))
    previous_values.append(value)
file1.close()
file2.close()

There is a gotcha here , though. 不过这里有一个陷阱 As I didn't import the os module (that is a default library) I used the characters \\n to end the line. 由于我没有导入os模块(这是默认库),因此我使用字符\\ n结束了这一行。 This will work fine under Linux and Mac, but under windows you should use \\r\\n. 在Linux和Mac上这可以正常工作,但是在Windows下您应该使用\\ r \\ n。 To avoid this issue you should import the os module and use os.linesep instead of \\n. 为避免此问题,您应该导入os模块并使用os.linesep代替\\ n。

import os
(...)
    file2.write('{};{}{}'.format(key, data[key], os.linesep))
(...)
    file2.write('{};{}{}'.format(key, value + sum(previous_values), os.linesep))

As a sidenote this is an example of how you could read your CSV without the need for the CSV module : 附带说明一下,这是无需CSV模块即可读取CSV的示例:

   data = [i.split(";") for i in open('MonthData1.csv').read().split('\n')]

If you had a more complex CSV file, especially if it had strings that could have semi-colons within, you'd better go for the CSV module. 如果您有一个更复杂的CSV文件,尤其是其中包含可能包含分号的字符串,则最好使用CSV模块。

The pandas library, mentioned in other answers is a great tool. 其他答案中提到的熊猫库是一个很棒的工具。 It will most certainly be able to handle any need you might have to deal with CSV data. 它无疑将能够处理您可能需要处理CSV数据的任何需求。

This code creates a new csv file with the same format as what's printed. 此代码创建一个新的csv文件,其格式与打印的格式相同。

import pandas as pd #added
import csv
from collections import defaultdict, OrderedDict


def convert(data):
    try:
        return int(data)
    except ValueError:
        return 0


keys = [] #added
data_keys = [] #added

with open('MonthData1.csv', 'r') as file1:
        read_file = csv.reader(file1, delimiter=';')
        delheader = next(read_file)
        data = defaultdict(int)
        for line in read_file:
            valuedata = max(0, sum([convert(i) for i in line[1:5]]))
            data[line[0].split()[0]] += valuedata

        for key in OrderedDict(sorted(data.items())):
            print('{} {}'.format(key, data[key]))
            keys.append(key) #added
            data_keys.append(data[key]) #added

        print("")
        keys.append("") #added
        data_keys.append("") #added
        previous_values = []
        for key, value in OrderedDict(sorted(data.items())).items():
            print('{} {}'.format(key, value + sum(previous_values)))
            keys.append(key) #added
            data_keys.append(value + sum(previous_values)) #added
            previous_values.append(value)

df = pd.DataFrame(data_keys,keys) #added
df.to_csv('new_csv_file.csv', header=False) #added

This is the version that does not use any imports at all 此版本完全不使用任何导入

def convert(data):
    try:
         out = int(data)
    except ValueError:
        out = 0
    return out ### try to avoid multiple return statements


with open('Monthdata1.csv', 'rb') as file1:
    lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
    key = d[0].split()[0]
    value = max(0, sum([convert(i) for i in d[1:5]]))
    try:
        myDict[key] += value
    except KeyError:
        myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
    accu += myDict[key]
    s1 += '{} {}\n'.format( key, myDict[key] )
    s2 += '{} {}\n'.format( key, accu )
with open( 'out.txt', 'wb') as fPntr:
    fPntr.write( s1 + "\n" + s2 )

This uses non-ordered dictionaries, though, such that sorted() may result in problems. 但是,这使用了无序字典,因此sorted()可能会导致问题。 So you actually might want to use datetime giving, eg: 因此,您实际上可能想使用datetime给定,例如:

import datetime

with open('Monthdata1.csv', 'rb') as file1:
    lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
    key  = datetime.datetime.strptime( d[0].split()[0], '%d.%m.%Y' )
    value = max(0, sum([convert(i) for i in d[1:5]]))
    try:
        myDict[key] += value
    except KeyError:
        myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
    accu += myDict[key]
    s1 += '{} {}\n'.format( key.strftime('%d.%m.%y'), myDict[key] )
    s2 += '{} {}\n'.format( key.strftime('%d.%m.%y'), accu )
with open( 'out.txt', 'wb') as fPntr:
    fPntr.write( s1 + "\n" + s2 )

Note that I changed to the 2 digit year by using %y instead of %Y in the output. 请注意,我通过在输出中使用%y而不是%Y更改为两位数年份。 This formatting also adds a 0 to day and month. 此格式还为日期和月份添加了0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM