[英]python dynamically create dictionary
给定python中的字符串列表
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
"0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"] where
s[0] = student ID,
s[1] = problem ID,
s[2] = score for the problem
我想找出每个学生解决的问题数量是否相同。 前任。 学生 0001 解决了 6 个问题,学生 0002 解决了 5 个问题,但学生 0001 两次尝试了问题 #5。 所以学生 0001 和学生 0002 都解决了 2 个问题。 我还需要检查每个学生是否解决了相同的问题 # 并在尝试的问题上获得了相同的分数。 我怎么写这是pythonic代码?
为此,您将遍历字符串列表,并按空格拆分该字符串:
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
"0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
for log in logs:
s = log.split(' ')
您将需要几个不同的分组(字典)来分析所有这些不同轴上的数据:
首先将信息整理到各个分组轴中:
logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95",
"0001 7 80", "0001 8 80", "0001 10 90", "0002 10 90",
"0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
students = dict() # {studentID: {problemID: max Score}} nested dictionaries
problems = dict() # {problemID: {studentIDs}} dictionary of sets
results = dict() # {(problemID,result): {studentIDs}} matching results
for s,p,r in map(str.split,logs):
scores = students.setdefault(s,dict()) # track problems per student
scores[p] = max(scores.get(p,r),r) # max score for student/problem
problems.setdefault(p,set()).add(s) # add student to problem's set
results.setdefault((p,r),set()).add(s) # add student to problem/result
然后您可以查询这些数据结构以获得您正在寻找的洞察力。
原始分组:
# problems solved by each student with their maximum result
print(students)
{'0001': {'3': '95', '5': '90', '7': '80', '8': '80', '10': '90'},
'0002': {'3': '95', '10': '90', '7': '80', '8': '80', '5': '100'},
'0003': {'99': '90'}}
# list of students that solved each problem
print(problems)
{'3': {'0002', '0001'},
'5': {'0002', '0001'},
'7': {'0002', '0001'},
'8': {'0002', '0001'},
'10': {'0002', '0001'},
'99': {'0003'}}
# list of students that got a specific result on each problem
print(results)
{('3', '95'): {'0002', '0001'}, ('5', '90'): {'0001'},
('5', '100'): {'0002', '0001'}, ('7', '80'): {'0002', '0001'},
('8', '80'): {'0002', '0001'}, ('10', '90'): {'0002', '0001'},
('99', '90'): {'0003'}}
通过聚合/过滤得出的信息:
# number of problems solved per student
print({s:len(pr) for s,pr in students.items()})
{'0001': 5, '0002': 5, '0003': 1}
# students that got the same score on the same problem (plagiarism?)
for (prob,result),students in results.items():
if len(students)>1:
print(f"# same result ({result}) on problem #{prob} :",students)
# same result (95) on problem #3 : {'0001', '0002'}
# same result (100) on problem #5 : {'0001', '0002'}
# same result (80) on problem #7 : {'0001', '0002'}
# same result (80) on problem #8 : {'0001', '0002'}
# same result (90) on problem #10 : {'0001', '0002'}
请注意,关系数据库通常是执行此类分析的更好工具。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.