简体   繁体   中英

How to convert csv to multiple arrays without pandas?

I have an csv file like this:

student_id,event_id,score
1,1,20
3,1,20
4,1,18
5,1,13
6,1,18
7,1,14
8,1,14
9,1,11
10,1,19
...

and I need to convert it into multiple arrays/lists like I did using pandas here:

scores = pd.read_csv("/content/score.csv", encoding = 'utf-8', 
                      index_col = [])
student_id = scores['student_id'].values
event_id = scores['event_id'].values
score = scores['score'].values
print(scores.head())

As you can see, I get three arrays, which I need in order to run the data analysis. How can I do this using Python's CSV library? I have to do this without the use of pandas. Also, how can I export data from multiple new arrays into a csv file when I am done with this data? I, again, used panda to do this:

avg = avgScore
max = maxScore
min = minScore
sum = sumScore
id = student_id_data
    
dict = {'avg(score)': avg, 'max(score)': max, 'min(score)': min, 'sum(score)': sum, 'student_id': id}  
     
df = pd.DataFrame(dict) 
  
df.to_csv(r'/content/AnalyzedData.csv', index=False)

Those first 5 are arrays if you are wondering.

Here's a partial answer which will produce a separate list for each column in the CSV file.

import csv

csv_filepath = "score.csv"

with open(csv_filepath, "r", newline='') as csv_file:
    reader = csv.DictReader(csv_file)
    columns = reader.fieldnames

    lists = {column: [] for column in columns}  # Lists for each column.

    for row in reader:
        for column in columns:
            lists[column].append(int(row[column]))

    for column_name, column in lists.items():
        print(f'{column_name}: {column}')

Sample output:

student_id: [1, 3, 4, 5, 6, 7, 8, 9, 10]
event_id: [1, 1, 1, 1, 1, 1, 1, 1, 1]
score: [20, 20, 18, 13, 18, 14, 14, 11, 19]

You also asked how to do the reverse of this. Here's an example I how is self-explanatory:

# Dummy sample analysis data
length = len(lists['student_id'])
avgScore = list(range(length))
maxScore = list(range(length))
minScore = list(range(length))
sumScore = list(range(length))
student_ids = lists['student_id']

csv_output_filepath = 'analysis.csv'
fieldnames = ('avg(score)', 'max(score)', 'min(score)', 'sum(score)', 'student_id')

with open(csv_output_filepath, 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames)
    writer.writeheader()

    for values in zip(avgScore, maxScore, minScore, sumScore, student_ids):
        row = dict(zip(fieldnames, values))  # Combine into dictionary.
        writer.writerow(row)

What you want to do does not require the csv module, it's just three lines of code (one of them admittedly dense)

splitted_lines = (line.split(',') for line in open('/path/to/you/data.csv')
labels = next(splitted_lines)
arr = dict(zip(labels,zip(*((int(i) for i in ii) for ii in splitted_lines))))
  1. splitted_lines is a generator that iterates over your data file one line at a time and provides you a list with the three (in your example) items in each line, line by line.

  2. next(splitted_lines) returns the list with the (splitted) content of the first line, that is our three labels

  3. We fit our data in a dictionary; using the class init method (ie, by invoking dict ) it is possible to initialize it using a generator of 2-uples, here the value of a zip :

    • zip 1st argument is labels , so the keys of the dictionary will be the labels of the columns

    • the 2nd argument is the result of the evaluation of an inner zip but in this case zip is used because zipping the starred form of a sequence of sequences has the effect of transposing it... so the value associated to each key will be the transpose of what follows *

      • what follows the * is simply (the generator equivalent of) a list of lists with (in your example) 9 rows of three integer values so that

      the second argument to the 1st zip is consequently a sequence of three sequences of nine integers, that are going to be coupled to the corresponding three keys/ labels

Here I have an example of using the data collected by the previous three lines of code

In [119]: print("\n".join("%15s:%s"%(l,','.join("%3d"%i for i in arr[l])) for l in labels))
     ...: 
     student_id:  1,  3,  4,  5,  6,  7,  8,  9, 10
       event_id:  1,  1,  1,  1,  1,  1,  1,  1,  1
          score: 20, 20, 18, 13, 18, 14, 14, 11, 19

In [120]: print(*arr['score'])
20 20 18 13 18 14 14 11 19

PS If the question were about an assignment in a sort of Python 101 it's unlikely that my solution would be deemed acceptable

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM