简体   繁体   English

Python:CSV 按列而不是行写入

[英]Python: CSV write by column rather than row

I have a python script that generates a bunch of data in a while loop.我有一个 python 脚本,它在 while 循环中生成一堆数据。 I need to write this data to a CSV file, so it writes by column rather than row.我需要将此数据写入 CSV 文件,因此它按列而不是行写入。

For example in loop 1 of my script I generate:例如,在我生成的脚本的循环 1 中:

(1, 2, 3, 4)

I need this to reflect in my csv script like so:我需要这样反映在我的 csv 脚本中:

Result_1    1
Result_2    2
Result_3    3
Result_4    4

On my second loop i generate:在我的第二个循环中,我生成:

(5, 6, 7, 8)

I need this to look in my csv file like so:我需要这样查看我的 csv 文件:

Result_1    1    5
Result_2    2    6
Result_3    3    7
Result_4    4    8

and so forth until the while loop finishes.依此类推,直到 while 循环结束。 Can anybody help me?有谁能够帮助我?


EDIT编辑

The while loop can last over 100,000 loops while 循环可以持续超过 100,000 次循环

The reason csv doesn't support that is because variable-length lines are not really supported on most filesystems. csv不支持的原因是因为大多数文件系统并不真正支持可变长度的行。 What you should do instead is collect all the data in lists, then call zip() on them to transpose them after. 你应该做的是收集列表中的所有数据,然后调用它们上面的zip()来转置它们。

>>> l = [('Result_1', 'Result_2', 'Result_3', 'Result_4'), (1, 2, 3, 4), (5, 6, 7, 8)]
>>> zip(*l)
[('Result_1', 1, 5), ('Result_2', 2, 6), ('Result_3', 3, 7), ('Result_4', 4, 8)]
wr.writerow(item)  #column by column
wr.writerows(item) #row by row

This is quite simple if your goal is just to write the output column by column. 如果您的目标只是逐列编写输出列,这非常简单。

If your item is a list: 如果您的商品是列表:

yourList = []

with open('yourNewFileName.csv', 'w', ) as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    for word in yourList:
        wr.writerow([word])

Updating lines in place in a file is not supported on most file system (a line in a file is just some data that ends with newline, the next line start just after that). 大多数文件系统不支持更新文件中的行(文件中的一行只是一些以换行结束的数据,下一行就在此之后开始)。

As I see it you have two options: 在我看来,你有两个选择:

  1. Have your data generating loops be generators, this way they won't consume a lot of memory - you'll get data for each row "just in time" 让你的数据生成循环成为生成器,这样他们就不会消耗大量内存 - 你会及时获得每一行的数据
  2. Use a database (sqlite?) and update the rows there. 使用数据库(sqlite?)并更新那里的行。 When you're done - export to CSV 完成后 - 导出为CSV

Small example for the first method: 第一种方法的小例子:

from itertools import islice, izip, count
print list(islice(izip(count(1), count(2), count(3)), 10))

This will print 这将打印

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10), (9, 10, 11), (10, 11, 12)]

even though count generate an infinite sequence of numbers 即使count产生无限的数字序列

Read it in by row and then transpose it in the command line. 逐行读取它,然后在命令行中转置它。 If you're using Unix, install csvtool and follow the directions in: https://unix.stackexchange.com/a/314482/186237 如果您使用的是Unix,请安装csvtool并按照以下说明操作: https ://unix.stackexchange.com/a/314482/186237

what about Result_* there also are generated in the loop (because i don't think it's possible to add to the csv file) 那么Result_*还会在循环中生成(因为我认为不可能添加到csv文件)

i will go like this ; 我会这样的; generate all the data at one rotate the matrix write in the file: 生成所有数据,旋转矩阵写入文件:

A = []

A.append(range(1, 5))  # an Example of you first loop

A.append(range(5, 9))  # an Example of you second loop

data_to_write = zip(*A)

# then you can write now row by row

Let's assume that (1) you don't have a large memory (2) you have row headings in a list (3) all the data values are floats; 让我们假设(1)你没有大内存(2)你在列表中有行标题(3)所有数据值都是浮点数; if they're all integers up to 32- or 64-bits worth, that's even better. 如果它们都是高达32位或64位的整数,那就更好了。

On a 32-bit Python, storing a float in a list takes 16 bytes for the float object and 4 bytes for a pointer in the list; 在32位Python上,将float存储在列表中对于float对象需要16个字节,对于列表中的指针需要4个字节; total 20. Storing a float in an array.array('d') takes only 8 bytes. 总计20.在array.array('d')中存储一个浮点只需要8个字节。 Increasingly spectacular savings are available if all your data are int (any negatives?) that will fit in 8, 4, 2 or 1 byte(s) -- especially on a recent Python where all ints are longs. 如果您的所有数据都是int(任何底片?),那么可以获得越来越多的节省,这些数据将适合8,4,2或1个字节 - 尤其是在最近所有整数都很长的Python上。

The following pseudocode assumes floats stored in array.array('d'). 以下伪代码假定浮点数存储在array.array('d')中。 In case you don't really have a memory problem, you can still use this method; 如果你真的没有内存问题,你仍然可以使用这种方法; I've put in comments to indicate the changes needed if you want to use a list. 如果您想使用列表,我已添加注释以指示所需的更改。

# Preliminary:
import array # list: delete
hlist = []
dlist = []
for each row: 
    hlist.append(some_heading_string)
    dlist.append(array.array('d')) # list: dlist.append([])
# generate data
col_index = -1
for each column:
    col_index += 1
    for row_index in xrange(len(hlist)):
        v = calculated_data_value(row_index, colindex)
        dlist[row_index].append(v)
# write to csv file
for row_index in xrange(len(hlist)):
    row = [hlist[row_index]]
    row.extend(dlist[row_index])
    csv_writer.writerow(row)

As an alternate streaming approach: 作为替代流媒体方法:

  • dump each col into a file 将每个col转储到一个文件中
  • use python or unix paste command to rejoin on tab, csv, whatever. 使用python或unix paste命令重新加入tab,csv等等。

Both steps should handle steaming just fine. 这两个步骤应该处理蒸汽就好了。

Pitfalls: 陷阱:

  • if you have 1000s of columns, you might run into the unix file handle limit! 如果你有1000列,你可能会遇到unix文件句柄限制!

After thinkering for a while i was able to come up with an easier way of achieving same goal.经过一段时间的思考,我能够想出一种更简单的方法来实现相同的目标。 Assuming you have the code as below:假设你有如下代码:

fruitList = ["Mango", "Apple", "Guava", "Grape", "Orange"]
vegList = ["Onion", "Garlic", "Shallot", "Pumpkin", "Potato"]
with open("NEWFILE.csv", "w") as csvfile:
    writer = csv.writer(csvfile)
    for value in range(len(fruitList)):
        writer.writerow([fruitList[value], vegList[value]])

zip<\/code> will only take number of elements equal to the shortest length list. zip<\/code>只会采用等于最短长度列表的元素数量。 If your columns are of equal length, you need to use zip_longest<\/code>如果您的列长度相等,则需要使用zip_longest<\/code>

import csv
from itertools import zip_longest

data = [[1,2,3,4],[5,6]]
columns_data = zip_longest(*data)

with open("file.csv","w") as f:
    writer = csv.writer(f)
    writer.writerows(columns_data)

FruitList = [“芒果”、“苹果”、“番石榴”、“葡萄”、“橙子”] vegList = [“洋葱”、“大蒜”、“青葱”、“南瓜”、“土豆”]

"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM