[英]how to extract sentences in a csv file based on keywords from another csv file and delete it from the main one
[英]How to extract values of one field of csv file based on other fields?
我有一个包含4个字段的csv文件; student_id
, date_of_exam
, subject
和marks
。 我想根据每个不同的student_id
和subject
将值存储在marks
列表中的某个字段中,以便稍后可以对该列表执行一些操作(例如:获取平均分数等)。
如果我事先有一个student_id
和subject
我可以这样做; 我可以对照csv文件中的所有值检查它们,并存储与该特定student_id
和subject
对应的marks
(如下面的代码片段所示)。 但是我该如何为每个学生做呢? 这是我似乎无法弄清楚的部分。
import csv
with open('results_file.csv', 'r') as f:
reader = csv.reader(f)
# next(reader)
marks = []
for line in reader:
if line[0] == student_id and line[2] == subject:
values.append(float(line[3]))
print("Maximum: {}, Minimum: {}, Average: {}, Count: {}".format(max(values), min(values), sum(values) / len(values), len(values)))
csv文件如下所示:
student_id,date_of_exam,subject,marks
a1,2012-05-21,Maths,45
a2,2012-05-24,Physics,48
a2,2012--5-27,Chemistry,42
a1,2012-05-15,Language,35
a2,2012-05-21,Maths,49
a3,2012-05-15,Language,47
您可以使用字典:
grades_per_student = {}
grades_per_subject = {}
with open('results_file.csv', 'r') as f:
reader = csv.reader(f)
for line in reader:
if line[0] in grades_per_student.keys():
grades_per_student[line[0]].append(line[-1])
else:
grades_per_student[line[0]] = [line[-1]]
if line[2] in grades_per_subject.keys():
grades_per_subject[line[2]].append(line[-1])
else:
grades_per_subject[line[2]] = [line[-1]]
结果:
grades_per_student = {'a1': [45, 35], 'a2': [48, 42,49], 'a3': [47]}
grades_per_subjects = {'Maths': [45, 49], 'Physics': [48], 'Chemistry': [42], 'Language': [35, 47]}
您可以使用collections.defaultdict
为每个学生/受试者存储分数:
import csv
from collections import defaultdict
with open('out.csv', 'r') as f:
reader = csv.reader(f)
next(reader) # skip header
marks = defaultdict(list)
grades = defaultdict(dict)
subjects = set()
for (student_id, date_of_exam, subject, mark) in reader:
marks[student_id].append(int(mark))
grades[student_id][subject] = int(mark)
subjects.add(subject)
subjects = sorted(subjects)
print('{: ^10}{: ^10}{: ^10}{: ^10}{: ^5}'.format('student_id', 'maximum', 'minimum', 'average', 'count'))
for student, marks in marks.items():
print('{: ^10}{: ^10}{: ^10}{: ^10.2f}{: ^5}'.format(student, max(marks), min(marks), sum(marks) / len(marks), len(marks) ))
print()
print('{: ^15}'.format('student\subject'), end='')
for s in subjects:
print('{: ^15}'.format(s), end='')
print()
for student_id, student_subjects in grades.items():
print('{: ^15}'.format(student_id), end='')
for s in subjects:
if s in student_subjects:
print('{: ^15}'.format(student_subjects[s]), end='')
else:
print('{: ^15}'.format('-'), end='')
print()
打印:
student_id maximum minimum average count
a1 45 35 40.00 2
a2 49 42 46.33 3
a3 47 47 47.00 1
student\subject Chemistry Language Maths Physics
a1 - 35 45 -
a2 42 - 49 48
a3 - 47 - -
我建议您使用pandas库 :
使用pandas.read_csv函数将数据读入数据框 。
传递参数names
,您只能加载所需的csv的列
import pandas as pd
df = pd.read_csv('results_file.csv', names=['student_id', 'subject', 'marks'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.