繁体   English   中英

python动态创建字典

[英]python dynamically create dictionary

给定python中的字符串列表

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
        "0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"] where 
s[0] = student ID, 
s[1] = problem ID,
s[2] = score for the problem 

我想找出每个学生解决的问题数量是否相同。 前任。 学生 0001 解决了 6 个问题,学生 0002 解决了 5 个问题,但学生 0001 两次尝试了问题 #5。 所以学生 0001 和学生 0002 都解决了 2 个问题。 我还需要检查每个学生是否解决了相同的问题 # 并在尝试的问题上获得了相同的分数。 我怎么写这是pythonic代码?

为此,您将遍历字符串列表,并按空格拆分该字符串:

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95", "0001 7 80", "0001 8 80",
        "0001 10 90", "0002 10 90", "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]
for log in logs:
    s = log.split(' ')

您将需要几个不同的分组(字典)来分析所有这些不同轴上的数据:

首先将信息整理到各个分组轴中:

logs = ["0001 3 95", "0001 5 90", "0001 5 100", "0002 3 95",
        "0001 7 80", "0001 8 80", "0001 10 90", "0002 10 90",
        "0002 7 80", "0002 8 80", "0002 5 100", "0003 99 90"]

students = dict() # {studentID: {problemID: max Score}} nested dictionaries
problems = dict() # {problemID: {studentIDs}} dictionary of sets
results  = dict() # {(problemID,result): {studentIDs}} matching results
for s,p,r in map(str.split,logs):
    scores = students.setdefault(s,dict()) # track problems per student
    scores[p] = max(scores.get(p,r),r)     # max score for student/problem
    problems.setdefault(p,set()).add(s)    # add student to problem's set
    results.setdefault((p,r),set()).add(s) # add student to problem/result

然后您可以查询这些数据结构以获得您正在寻找的洞察力。

原始分组:

# problems solved by each student with their maximum result
print(students)
{'0001': {'3': '95', '5': '90', '7': '80', '8': '80', '10': '90'},
 '0002': {'3': '95', '10': '90', '7': '80', '8': '80', '5': '100'},
 '0003': {'99': '90'}}

# list of students that solved each problem
print(problems)
{'3': {'0002', '0001'},
 '5': {'0002', '0001'},
 '7': {'0002', '0001'},
 '8': {'0002', '0001'},
 '10': {'0002', '0001'},
 '99': {'0003'}}

# list of students that got a specific result on each problem
print(results)
{('3', '95'): {'0002', '0001'}, ('5', '90'): {'0001'},
 ('5', '100'): {'0002', '0001'}, ('7', '80'): {'0002', '0001'},
 ('8', '80'): {'0002', '0001'}, ('10', '90'): {'0002', '0001'},
 ('99', '90'): {'0003'}}

通过聚合/过滤得出的信息:

# number of problems solved per student
print({s:len(pr) for s,pr in students.items()}) 
{'0001': 5, '0002': 5, '0003': 1}
    
# students that got the same score on the same problem (plagiarism?)
for (prob,result),students in results.items():
    if len(students)>1:
        print(f"# same result ({result}) on problem #{prob} :",students)

# same result (95) on problem #3 : {'0001', '0002'}
# same result (100) on problem #5 : {'0001', '0002'}
# same result (80) on problem #7 : {'0001', '0002'}
# same result (80) on problem #8 : {'0001', '0002'}
# same result (90) on problem #10 : {'0001', '0002'}

请注意,关系数据库通常是执行此类分析的更好工具。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM