简体   繁体   English

从文本文件中提取信息并将其转换为字典

[英]extracting information from a text file and convert it into a dictionary

New to Python, sorry if this is too easy, I usually work with R but want to try out this. Python 的新手,对不起,如果这太简单了,我通常使用 R 但想试试这个。 I am trying to convert a csv file with student numbers, course ID(in total 7 courses) and the rating into a dictionary.我正在尝试将带有学生编号、课程 ID(总共 7 门课程)和评分的 csv 文件转换为字典。 It is different than the other questions since the key in the csv file is not a unique value, it is duplicated randomly based on how many courses this student evaluated.它与其他问题不同,因为 csv 文件中的密钥不是唯一值,它是根据学生评估的课程数量随机复制的。 The sample data look like this:示例数据如下所示:

participant_id;course_id;rating
103;4;2
104;5;3.5
104;7;2.5
108;3;3.5
108;5;2
114;2;4.5
114;5;3.5
114;7;4.5
116;1;2
116;2;3
116;3;3
116;4;4
126;5;3
129;1;4
129;5;3.5
135;1;4.5

so the optimal outcome would look like this, student numbers would be the key and value would be a list, with course_id as the index of the list and rating as the value.所以最佳结果应该是这样的,学生人数将是键,值将是一个列表,其中 course_id 作为列表的索引,而 rating 作为值。 The rest is just NA. rest 只是不适用。

{'103': ['NA', 'NA', 'NA', 2.0, 'NA', 'NA', 'NA'],
 '104': ['NA', 'NA', 'NA', 'NA', 3.5, 'NA', 2.5],
 '108': ['NA', 'NA', '3.5, 'NA',2.0', 'NA', 'NA'],
 '114': ['NA', 4.5, 'NA', 'NA', 3.5, 'NA', '4.5],
 '116': [2.0, 3.0, 3.0, 4.0, 'NA', 'NA', 'NA'],
 '126': ['NA', 'NA', 'NA', 'NA', 3.0, 'NA', 'NA'],
 '129': [4.0, 'NA', 'NA', 'NA', '3.5, 'NA', 'NA'],
 '135': [4.5, 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']}

I tried to extract the student number using set() and now I have the unique value for each student number and all I can do is to make a list with the right key but all the course ratings are NA because I don't know how to extract the course_id and rating in groups and put them into the list.我尝试使用 set() 提取学生编号,现在我拥有每个学生编号的唯一值,我所能做的就是用正确的键制作一个列表,但所有课程评分都是 NA,因为我不知道如何以分组提取 course_id 和 rating 并将它们放入列表中。 Here is my code so far:到目前为止,这是我的代码:

def ratings(filename):
    with open(filename) as fp: 
        buffer = fp.readlines()
        stu_id = []
        dic = {}

        for i in (buffer):
            stu_id.append(i.split(';')[0])
            stu_id_set = list(set(stu_id))
            for j in stu_id_set:
                dic[j] = ['NA','NA','NA','NA','NA','NA','NA']
    return dic


We can do something like this:我们可以这样做:

def ratings(filename):
    d = {}
    max_col = 0                                     # Number of columns needed. Maximum course_id.
    idx_col_val_list = []

    with open(filename) as fp:
        fp.readline()                               # Ignore "participant_id;course_id;rating"

        for line in fp.readlines():
            line = line.strip()
            idx, col, val = line.split(';')
            col = int(col)
            val = float(val)

            max_col = max(max_col, col)
            idx_col_val_list.append((idx, col, val))

    for idx, col, val in idx_col_val_list:
        if idx not in d:
            d[idx] = ['NA'] * max_col
        d[idx][col - 1] = val

    return d


ans = ratings('input.txt')

assert ans == {
    '103': ['NA', 'NA', 'NA', 2.0, 'NA', 'NA', 'NA'],
    '104': ['NA', 'NA', 'NA', 'NA', 3.5, 'NA', 2.5],
    '108': ['NA', 'NA', 3.5, 'NA',2.0, 'NA', 'NA'],
    '114': ['NA', 4.5, 'NA', 'NA', 3.5, 'NA', 4.5],
    '116': [2.0, 3.0, 3.0, 4.0, 'NA', 'NA', 'NA'],
    '126': ['NA', 'NA', 'NA', 'NA', 3.0, 'NA', 'NA'],
    '129': [4.0, 'NA', 'NA', 'NA', 3.5, 'NA', 'NA'],
    '135': [4.5, 'NA', 'NA', 'NA', 'NA', 'NA', 'NA'],
}

Here's a compact approach using pandas and dictionaries:这是使用pandas和字典的紧凑方法:

import pandas as pd

df = pd.read_csv('your_csv_file.csv')

# build a list of dictionaries
# each element will lool like {'participant_id':104, 'course_id':4, 'rating':2}
records = df.to_dict(orient='records')

# initialize the final dictionary
# assign a 7-element list to each participant, filled with zeros
performance = {i['participant_id']:7*[0] for i in records}

# populate the final dictionary
for r in records:
    performance[r['participant_id']][r['course_id']] = r['rating']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM