[英]How to convert csv to multiple arrays without pandas?
I have an csv file like this:我有一个像这样的 csv 文件:
student_id,event_id,score
1,1,20
3,1,20
4,1,18
5,1,13
6,1,18
7,1,14
8,1,14
9,1,11
10,1,19
...
and I need to convert it into multiple arrays/lists like I did using pandas here:我需要将它转换成多个数组/列表,就像我在这里使用熊猫一样:
scores = pd.read_csv("/content/score.csv", encoding = 'utf-8',
index_col = [])
student_id = scores['student_id'].values
event_id = scores['event_id'].values
score = scores['score'].values
print(scores.head())
As you can see, I get three arrays, which I need in order to run the data analysis.如您所见,我得到了三个数组,我需要这些数组来运行数据分析。 How can I do this using Python's CSV library?如何使用 Python 的 CSV 库执行此操作? I have to do this without the use of pandas.我必须在不使用熊猫的情况下做到这一点。 Also, how can I export data from multiple new arrays into a csv file when I am done with this data?另外,当我处理完这些数据后,如何将数据从多个新数组导出到一个 csv 文件中? I, again, used panda to do this:我再次使用熊猫来做到这一点:
avg = avgScore
max = maxScore
min = minScore
sum = sumScore
id = student_id_data
dict = {'avg(score)': avg, 'max(score)': max, 'min(score)': min, 'sum(score)': sum, 'student_id': id}
df = pd.DataFrame(dict)
df.to_csv(r'/content/AnalyzedData.csv', index=False)
Those first 5 are arrays if you are wondering.如果您想知道,前 5 个是数组。
Here's a partial answer which will produce a separate list for each column in the CSV file.这是一个部分答案,它将为 CSV 文件中的每一列生成一个单独的列表。
import csv
csv_filepath = "score.csv"
with open(csv_filepath, "r", newline='') as csv_file:
reader = csv.DictReader(csv_file)
columns = reader.fieldnames
lists = {column: [] for column in columns} # Lists for each column.
for row in reader:
for column in columns:
lists[column].append(int(row[column]))
for column_name, column in lists.items():
print(f'{column_name}: {column}')
Sample output:示例输出:
student_id: [1, 3, 4, 5, 6, 7, 8, 9, 10]
event_id: [1, 1, 1, 1, 1, 1, 1, 1, 1]
score: [20, 20, 18, 13, 18, 14, 14, 11, 19]
You also asked how to do the reverse of this.你还问了如何做相反的事情。 Here's an example I how is self-explanatory:这是一个不言自明的例子:
# Dummy sample analysis data
length = len(lists['student_id'])
avgScore = list(range(length))
maxScore = list(range(length))
minScore = list(range(length))
sumScore = list(range(length))
student_ids = lists['student_id']
csv_output_filepath = 'analysis.csv'
fieldnames = ('avg(score)', 'max(score)', 'min(score)', 'sum(score)', 'student_id')
with open(csv_output_filepath, 'w', newline='') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames)
writer.writeheader()
for values in zip(avgScore, maxScore, minScore, sumScore, student_ids):
row = dict(zip(fieldnames, values)) # Combine into dictionary.
writer.writerow(row)
What you want to do does not require the csv
module, it's just three lines of code (one of them admittedly dense)你想要做的不需要csv
模块,它只是三行代码(其中之一是密集的)
splitted_lines = (line.split(',') for line in open('/path/to/you/data.csv')
labels = next(splitted_lines)
arr = dict(zip(labels,zip(*((int(i) for i in ii) for ii in splitted_lines))))
splitted_lines
is a generator that iterates over your data file one line at a time and provides you a list with the three (in your example) items in each line, line by line. splitted_lines
是一个生成器,它一次遍历您的数据文件一行,并为您提供一个列表,其中包含每行中的三个(在您的示例中)项目,逐行。
next(splitted_lines)
returns the list with the (splitted) content of the first line, that is our three labels
next(splitted_lines)
返回包含第一行(拆分)内容的列表,即我们的三个labels
We fit our data in a dictionary;我们将数据放入字典中; using the class init method (ie, by invoking dict
) it is possible to initialize it using a generator of 2-uples, here the value of a zip
:使用类 init 方法(即,通过调用dict
)可以使用 2-uples 的生成器对其进行初始化,这里是zip
的值:
zip
1st argument is labels
, so the keys of the dictionary will be the labels of the columns zip
第一个参数是labels
,所以字典的键将是列的标签
the 2nd argument is the result of the evaluation of an inner zip
but in this case zip
is used because zipping the starred form of a sequence of sequences has the effect of transposing it... so the value associated to each key will be the transpose of what follows *
…第二个参数是内部zip
评估的结果,但在这种情况下使用zip
是因为压缩序列序列的带星号形式具有转置它的效果......因此与每个键关联的值将是转置以下内容*
...
*
is simply (the generator equivalent of) a list of lists with (in your example) 9 rows of three integer values so that *
内容只是(生成器等效于)一个列表列表,其中包含(在您的示例中)9 行三个整数值,以便the second argument to the 1st zip
is consequently a sequence of three sequences of nine integers, that are going to be coupled to the corresponding three keys/ labels
因此,第一个zip
的第二个参数是一个由 9 个整数组成的三个序列的序列,这些序列将耦合到相应的三个键/ labels
Here I have an example of using the data collected by the previous three lines of code这里我有一个使用前三行代码收集的数据的例子
In [119]: print("\n".join("%15s:%s"%(l,','.join("%3d"%i for i in arr[l])) for l in labels))
...:
student_id: 1, 3, 4, 5, 6, 7, 8, 9, 10
event_id: 1, 1, 1, 1, 1, 1, 1, 1, 1
score: 20, 20, 18, 13, 18, 14, 14, 11, 19
In [120]: print(*arr['score'])
20 20 18 13 18 14 14 11 19
PS If the question were about an assignment in a sort of Python 101 it's unlikely that my solution would be deemed acceptable PS 如果问题是关于某种 Python 101 中的作业,那么我的解决方案不太可能被认为是可以接受的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.