简体   繁体   English

难于将CSV转换为2D / 3D的CSV列表

[英]Awkward CSV to 2D/3D list in Python

I have some data in a CSV as shown further down. 我在CSV中有一些数据,如下所示。 I am trying to get it into a sensible form, so I can plot the common x-axis [1, 2, 3] against y-axes [18, 22, 24] and [58, 68, 55] with 'A' and 'B' as the legends. 我正在尝试将其转换为合理的形式,因此我可以用'A'将公共x轴[1、2、3]相对于y轴[18、22、24]和[58、68、55]绘制出来和“ B”作为传说。

My current thoughts are that the following structure would be easiest, although I get repetition of the x-axis. 我目前的想法是,尽管我得到了x轴的重复,但是下面的结构将是最简单的。

[['A',[1,'A1',18],[2,'A2',22],[3,'A3',24]],
 ['B',[1,'B4',58],[2,'B4',68],[3,'B6',55]]]

Here is the ugly data. 这是丑陋的数据。 As you can probably tell, A and B are headers. 您可能会说,A和B是标题。 18 corresponds to A1 at point 1, 22 to A2 at point 2, etc. I tried checking for the empty 'cell' and inserting into the current array, however it got very messy, and I am stuck trying to extend this so that it could cope with 50+ columns and 20+ lines. 18对应于点1处的A1,22对应于点2处的A2,依此类推。我尝试检查空的“单元格”并将其插入当前数组中,但是它变得非常混乱,因此我一直试图扩展它,以便它可以应付50列以上和20列以上的行。

,A,B
1,A1,B4
,18,58
2,A2,B5
,22,68
3,A3,B6
,24,55

The advice here was helpful but I couldn't apply it to my situation. 这里的建议很有帮助,但我无法将其应用于我的情况。 The below code worked for one column but required further manipulation and broke down once I added additional columns to the CSV file. 以下代码适用于一列,但需要进一步操作,一旦我将其他列添加到CSV文件中,该代码就会崩溃。

import csv

arr = []

datafile = open('datafile1.csv', 'r', newline='')
reader = csv.reader(datafile)
for row in reader:
    if row[0] != "":
        #print(row)
        arr.append(row)
    elif row[1] != "":
        arr[-1].insert(len(arr),row[1])

datafile.close()

Thanks in advance for any help you can provide! 预先感谢您提供的任何帮助!

If you want to plot your data, the best format is a list for x and a list of lists for y . 如果要绘制数据,最好的格式是x的列表和y的列表。 And naturally, a list of lists for the labels. 当然,还有标签列表的列表。

The legends are in the first line, so you can read that and be done with it. 图例位于第一行,因此您可以阅读并完成操作。 Then read every two lines to extract the x and label data and then read every two lines again with an offset of 1 to read all the y data. 然后每两行读取一次以提取x和标签数据,然后以偏移量1再次每两行读取一次以读取所有y数据。 Some zip() and unpacking magic and you're done. 一些zip()和解zip()魔术,您就完成了。

import csv

import matplotlib.pyplot as plt

def load_data(file):
    reader = csv.reader(open(file, 'r', newline=''))
    lines = tuple(reader)

    legends = lines[0][1:]
    x, *labels = zip(*lines[1::2])
    _, *y = zip(*lines[2::2])
    # must convert the data from strings to integers
    # if floats are allowed in the data, use `float` instead
    x = tuple(map(int, x))
    y = tuple(tuple(map(int, column)) for column in y)

    return x, y, legends, labels

def plot_columns(x, y, legends, labels):
    for k in range(len(y)):
        plt.plot(x, y[k])
        for xi, yi, ilabel in zip(x, y[k], labels[k]):
            plt.annotate(ilabel, xy=(xi, yi), textcoords='data')
    plt.legend(legends)
    plt.show()

plot_columns(*load_data('datafiel1.csv'))

If you're on Python 2, the unpacking in x, *labels = zip(*lines[1::2]) is not allowed. 如果您使用的是Python 2,则不允许在x, *labels = zip(*lines[1::2])解包。 Instead, do it in steps 而是分步进行

# for x and labels
temp = zip(*lines[1::2])
x, labels = temp[0], temp[1:]
# for y
y = zip(*lines[2::2])[1:]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM