简体   繁体   中英

Awkward CSV to 2D/3D list in Python

I have some data in a CSV as shown further down. I am trying to get it into a sensible form, so I can plot the common x-axis [1, 2, 3] against y-axes [18, 22, 24] and [58, 68, 55] with 'A' and 'B' as the legends.

My current thoughts are that the following structure would be easiest, although I get repetition of the x-axis.

[['A',[1,'A1',18],[2,'A2',22],[3,'A3',24]],
 ['B',[1,'B4',58],[2,'B4',68],[3,'B6',55]]]

Here is the ugly data. As you can probably tell, A and B are headers. 18 corresponds to A1 at point 1, 22 to A2 at point 2, etc. I tried checking for the empty 'cell' and inserting into the current array, however it got very messy, and I am stuck trying to extend this so that it could cope with 50+ columns and 20+ lines.

,A,B
1,A1,B4
,18,58
2,A2,B5
,22,68
3,A3,B6
,24,55

The advice here was helpful but I couldn't apply it to my situation. The below code worked for one column but required further manipulation and broke down once I added additional columns to the CSV file.

import csv

arr = []

datafile = open('datafile1.csv', 'r', newline='')
reader = csv.reader(datafile)
for row in reader:
    if row[0] != "":
        #print(row)
        arr.append(row)
    elif row[1] != "":
        arr[-1].insert(len(arr),row[1])

datafile.close()

Thanks in advance for any help you can provide!

If you want to plot your data, the best format is a list for x and a list of lists for y . And naturally, a list of lists for the labels.

The legends are in the first line, so you can read that and be done with it. Then read every two lines to extract the x and label data and then read every two lines again with an offset of 1 to read all the y data. Some zip() and unpacking magic and you're done.

import csv

import matplotlib.pyplot as plt

def load_data(file):
    reader = csv.reader(open(file, 'r', newline=''))
    lines = tuple(reader)

    legends = lines[0][1:]
    x, *labels = zip(*lines[1::2])
    _, *y = zip(*lines[2::2])
    # must convert the data from strings to integers
    # if floats are allowed in the data, use `float` instead
    x = tuple(map(int, x))
    y = tuple(tuple(map(int, column)) for column in y)

    return x, y, legends, labels

def plot_columns(x, y, legends, labels):
    for k in range(len(y)):
        plt.plot(x, y[k])
        for xi, yi, ilabel in zip(x, y[k], labels[k]):
            plt.annotate(ilabel, xy=(xi, yi), textcoords='data')
    plt.legend(legends)
    plt.show()

plot_columns(*load_data('datafiel1.csv'))

If you're on Python 2, the unpacking in x, *labels = zip(*lines[1::2]) is not allowed. Instead, do it in steps

# for x and labels
temp = zip(*lines[1::2])
x, labels = temp[0], temp[1:]
# for y
y = zip(*lines[2::2])[1:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM