I have an csv file like this:
student_id,event_id,score
1,1,20
3,1,20
4,1,18
5,1,13
6,1,18
7,1,14
8,1,14
9,1,11
10,1,19
...
and I need to convert it into multiple arrays/lists like I did using pandas here:
scores = pd.read_csv("/content/score.csv", encoding = 'utf-8',
index_col = [])
student_id = scores['student_id'].values
event_id = scores['event_id'].values
score = scores['score'].values
print(scores.head())
As you can see, I get three arrays, which I need in order to run the data analysis. How can I do this using Python's CSV library? I have to do this without the use of pandas. Also, how can I export data from multiple new arrays into a csv file when I am done with this data? I, again, used panda to do this:
avg = avgScore
max = maxScore
min = minScore
sum = sumScore
id = student_id_data
dict = {'avg(score)': avg, 'max(score)': max, 'min(score)': min, 'sum(score)': sum, 'student_id': id}
df = pd.DataFrame(dict)
df.to_csv(r'/content/AnalyzedData.csv', index=False)
Those first 5 are arrays if you are wondering.
Here's a partial answer which will produce a separate list for each column in the CSV file.
import csv
csv_filepath = "score.csv"
with open(csv_filepath, "r", newline='') as csv_file:
reader = csv.DictReader(csv_file)
columns = reader.fieldnames
lists = {column: [] for column in columns} # Lists for each column.
for row in reader:
for column in columns:
lists[column].append(int(row[column]))
for column_name, column in lists.items():
print(f'{column_name}: {column}')
Sample output:
student_id: [1, 3, 4, 5, 6, 7, 8, 9, 10]
event_id: [1, 1, 1, 1, 1, 1, 1, 1, 1]
score: [20, 20, 18, 13, 18, 14, 14, 11, 19]
You also asked how to do the reverse of this. Here's an example I how is self-explanatory:
# Dummy sample analysis data
length = len(lists['student_id'])
avgScore = list(range(length))
maxScore = list(range(length))
minScore = list(range(length))
sumScore = list(range(length))
student_ids = lists['student_id']
csv_output_filepath = 'analysis.csv'
fieldnames = ('avg(score)', 'max(score)', 'min(score)', 'sum(score)', 'student_id')
with open(csv_output_filepath, 'w', newline='') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames)
writer.writeheader()
for values in zip(avgScore, maxScore, minScore, sumScore, student_ids):
row = dict(zip(fieldnames, values)) # Combine into dictionary.
writer.writerow(row)
What you want to do does not require the csv
module, it's just three lines of code (one of them admittedly dense)
splitted_lines = (line.split(',') for line in open('/path/to/you/data.csv')
labels = next(splitted_lines)
arr = dict(zip(labels,zip(*((int(i) for i in ii) for ii in splitted_lines))))
splitted_lines
is a generator that iterates over your data file one line at a time and provides you a list with the three (in your example) items in each line, line by line.
next(splitted_lines)
returns the list with the (splitted) content of the first line, that is our three labels
We fit our data in a dictionary; using the class init method (ie, by invoking dict
) it is possible to initialize it using a generator of 2-uples, here the value of a zip
:
zip
1st argument is labels
, so the keys of the dictionary will be the labels of the columns
the 2nd argument is the result of the evaluation of an inner zip
but in this case zip
is used because zipping the starred form of a sequence of sequences has the effect of transposing it... so the value associated to each key will be the transpose of what follows *
…
*
is simply (the generator equivalent of) a list of lists with (in your example) 9 rows of three integer values so that the second argument to the 1st zip
is consequently a sequence of three sequences of nine integers, that are going to be coupled to the corresponding three keys/ labels
Here I have an example of using the data collected by the previous three lines of code
In [119]: print("\n".join("%15s:%s"%(l,','.join("%3d"%i for i in arr[l])) for l in labels))
...:
student_id: 1, 3, 4, 5, 6, 7, 8, 9, 10
event_id: 1, 1, 1, 1, 1, 1, 1, 1, 1
score: 20, 20, 18, 13, 18, 14, 14, 11, 19
In [120]: print(*arr['score'])
20 20 18 13 18 14 14 11 19
PS If the question were about an assignment in a sort of Python 101 it's unlikely that my solution would be deemed acceptable
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.