[英]Reading lines from a txt file and create a dictionary where values are list of tuples
student.txt:学生.txt:
Akçam Su Tilsim PSYC 3.9
Aksel Eda POLS 2.78
Alpaydin Dilay ECON 1.2
Atil Turgut Uluç IR 2.1
Deveci Yasemin PSYC 2.9
Erserçe Yasemin POLS 3.0
Gülle Halil POLS 2.7
Gündogdu Ata Alp ECON 4.0
Gungor Muhammed Yasin POLS 3.1
Hammoud Rawan IR 1.7
Has Atakan POLS 1.97
Ince Kemal Kahriman IR 2.0
Kaptan Deniz IR 3.5
Kestir Bengisu IR 3.8
Koca Aysu ECON 2.5
Kolayli Sena Göksu IR 2.8
Kumman Gizem PSYC 2.9
Madenoglu Zeynep PSYC 3.1
Naghiyeva Gulustan IR 3.8
Ok Arda Mert IR 3.2
Var Berna ECON 2.9
Yeltekin Sude PSYC 1.2
Hello, I want to write a function, which reads the information about each student in the file into a dictionary where the keys are the departments, and the values are a list of students in the given department (list of tuples).你好,我想写一个函数,将文件中每个学生的信息读入字典,其中键是系,值是给定系的学生列表(元组列表)。 The information about each student is stored in a tuple containing (surname, GPA).每个学生的信息存储在一个包含 (surname, GPA) 的元组中。 Students in the file may have more than one name but only the surname and gpa will be stored.文件中的学生可能有多个姓名,但只会存储姓氏和 gpa。 The function should return the dictionary.该函数应该返回字典。 (Surnames are the first words at each line.) (姓氏是每行的第一个词。)
This is what I tried:这是我尝试过的:
def read_student(ifile):
D={}
f1=open(ifile,'r')
for line in f1:
tab=line.find('\t')
space=line.rfind(' ')
rtab=line.rfind('\t')
student_surname=line[0:tab]
gpa=line[space+1:]
department=line[rtab+1:space]
if department not in D:
D[department]=[(student_surname,gpa)]
else:
D[department].append((student_surname,gpa))
f1.close()
return D
print(read_student('student.txt'))
I think the main problem is that there is a sort of disorder because sometimes tab comes after words and sometimes a space comes after words, so I dont know how to use find function properly in this case.我认为主要问题是存在一种混乱,因为有时在单词之后出现制表符,有时在单词之后出现空格,所以我不知道在这种情况下如何正确使用 find 功能。
see below - you will have to take care of the surname but rest of the details in the question were handled见下文-您必须注意姓氏,但已处理问题中的其余细节
from collections import defaultdict
data = defaultdict(list)
with open('data.txt', encoding="utf-8") as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
first_space_idx = line.rfind(' ')
sec_space_idx = line.rfind(' ', 0,first_space_idx - 1)
grade = line[first_space_idx+1:]
dep = line[sec_space_idx:first_space_idx]
student = line[:sec_space_idx].strip()
data[dep].append((student, grade))
for dep, students in data.items():
print(f'{dep} --> {students}')
output输出
PSYC --> [('Akçam Su Tilsim', '3.9'), ('Deveci Yasemin', '2.9'), ('Kumman Gizem', '2.9'), ('Madenoglu Zeynep', '3.1'), ('Yeltekin Sude', '1.2')]
POLS --> [('Aksel Eda', '2.78'), ('Erserçe Yasemin', '3.0'), ('Gülle Halil', '2.7'), ('Gungor Muhammed Yasin', '3.1'), ('Has Atakan', '1.97')]
ECON --> [('Alpaydin Dilay', '1.2'), ('Gündogdu Ata Alp', '4.0'), ('Koca Aysu', '2.5'), ('Var Berna', '2.9')]
IR --> [('Atil Turgut Uluç', '2.1'), ('Hammoud Rawan', '1.7'), ('Ince Kemal Kahriman', '2.0'), ('Kaptan Deniz', '3.5'), ('Kestir Bengisu', '3.8'), ('Kolayli Sena Göksu', '2.8'), ('Naghiyeva Gulustan', '3.8'), ('Ok Arda Mert', '3.2')]
Why mess with rfind
and find
when you can simply split
?为什么要搞乱rfind
并find
什么时候可以简单地split
?
def read_student(ifile):
D = {}
f1 = open(ifile,'r')
for line in f1:
cols = line.split() # Splits at one or more whitespace
surname = cols[0].strip()
department = cols[-2].strip() # Because you know the last-but-one is dept
gpa = float(cols[-1].strip()) # Because you know the last one is GPA
fname = ' '.join(cols[1:-2]).strip()
# cols[1:-2] gives you everything starting at col 1 up to but excluding the second-last.
# Then you join these with spaces.
if department not in D:
D[department] = [(surname, gpa)]
else:
D[department].append((surname, gpa))
f1.close()
return D
If you know that your columns are separated by \\t
always, you can do cols = line.split('\\t')
instead.如果你知道你的列总是由\\t
分隔,你可以cols = line.split('\\t')
。 Then you have the students' fname in the second column, the department in the third, and the GPA in the fourth.然后第二列是学生的姓名,第三列是系,第四列是 GPA。
A couple of suggestions:几个建议:
defaultdict
to avoid checking if department not in D
every time您可以使用defaultdict
来避免每次检查if department not in D
with
to manage reading the file so you don't have to worry about f1.close()
.您可以使用with
来管理读取文件,因此您不必担心f1.close()
。 This is the preferred way to read files in Python.这是在 Python 中读取文件的首选方式。You can use split(' ', 1)
to extract surname.您可以使用split(' ', 1)
来提取姓氏。 It gives list with two elements.它给出了包含两个元素的列表。 first one is surname.第一个是姓氏。 Then again split the second elements to get the using rsplit(' ', 1)
.然后再次拆分第二个元素以获得 using rsplit(' ', 1)
。 It again gives list with two element first one is name and dept and second one is gpa.它再次给出包含两个元素的列表,第一个是名称和部门,第二个是 gpa。 Again split second element to get department.再次拆分第二个元素以获取部门。
def read_student(ifile):
d = {}
with open(ifile) as fp:
for line in fp:
fname, data = line.strip().split(' ', 1)
data, gpa = data.rsplit(' ', 1)
dept = data.split()[-1]
d.setdefault(dept, []).append((fname, gpa))
return d
print(read_student('student.txt'))
Output:输出:
{'ECON': [('Alpaydin', '1.2'),
('Gündogdu', '4.0'),
('Koca', '2.5'),
('Var', '2.9')],
'IR': [('Atil', '2.1'),
('Hammoud', '1.7'),
('Ince', '2.0'),
('Kaptan', '3.5'),
('Kestir', '3.8'),
('Kolayli', '2.8'),
('Naghiyeva', '3.8'),
('Ok', '3.2')],
'POLS': [('Aksel', '2.78'),
('Erserçe', '3.0'),
('Gülle', '2.7'),
('Gungor', '3.1'),
('Has', '1.97')],
'PSYC': [('Akçam', '3.9'),
('Deveci', '2.9'),
('Kumman', '2.9'),
('Madenoglu', '3.1'),
('Yeltekin', '1.2')]}
This solution makes use of itemgetter to simplify the getting of variables: surname, dept.该解决方案利用itemgetter来简化变量的获取:姓氏、部门。 and gpa和gpa
from operator import itemgetter
d = dict()
with open('f0.txt', 'r') as f:
for line in f:
name, dept, gpa = itemgetter(0, -2, -1)(line.split())
d.setdefault(dept, []).append((name, gpa))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.