student.txt:
Akçam Su Tilsim PSYC 3.9
Aksel Eda POLS 2.78
Alpaydin Dilay ECON 1.2
Atil Turgut Uluç IR 2.1
Deveci Yasemin PSYC 2.9
Erserçe Yasemin POLS 3.0
Gülle Halil POLS 2.7
Gündogdu Ata Alp ECON 4.0
Gungor Muhammed Yasin POLS 3.1
Hammoud Rawan IR 1.7
Has Atakan POLS 1.97
Ince Kemal Kahriman IR 2.0
Kaptan Deniz IR 3.5
Kestir Bengisu IR 3.8
Koca Aysu ECON 2.5
Kolayli Sena Göksu IR 2.8
Kumman Gizem PSYC 2.9
Madenoglu Zeynep PSYC 3.1
Naghiyeva Gulustan IR 3.8
Ok Arda Mert IR 3.2
Var Berna ECON 2.9
Yeltekin Sude PSYC 1.2
Hello, I want to write a function, which reads the information about each student in the file into a dictionary where the keys are the departments, and the values are a list of students in the given department (list of tuples). The information about each student is stored in a tuple containing (surname, GPA). Students in the file may have more than one name but only the surname and gpa will be stored. The function should return the dictionary. (Surnames are the first words at each line.)
This is what I tried:
def read_student(ifile):
D={}
f1=open(ifile,'r')
for line in f1:
tab=line.find('\t')
space=line.rfind(' ')
rtab=line.rfind('\t')
student_surname=line[0:tab]
gpa=line[space+1:]
department=line[rtab+1:space]
if department not in D:
D[department]=[(student_surname,gpa)]
else:
D[department].append((student_surname,gpa))
f1.close()
return D
print(read_student('student.txt'))
I think the main problem is that there is a sort of disorder because sometimes tab comes after words and sometimes a space comes after words, so I dont know how to use find function properly in this case.
see below - you will have to take care of the surname but rest of the details in the question were handled
from collections import defaultdict
data = defaultdict(list)
with open('data.txt', encoding="utf-8") as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
first_space_idx = line.rfind(' ')
sec_space_idx = line.rfind(' ', 0,first_space_idx - 1)
grade = line[first_space_idx+1:]
dep = line[sec_space_idx:first_space_idx]
student = line[:sec_space_idx].strip()
data[dep].append((student, grade))
for dep, students in data.items():
print(f'{dep} --> {students}')
output
PSYC --> [('Akçam Su Tilsim', '3.9'), ('Deveci Yasemin', '2.9'), ('Kumman Gizem', '2.9'), ('Madenoglu Zeynep', '3.1'), ('Yeltekin Sude', '1.2')]
POLS --> [('Aksel Eda', '2.78'), ('Erserçe Yasemin', '3.0'), ('Gülle Halil', '2.7'), ('Gungor Muhammed Yasin', '3.1'), ('Has Atakan', '1.97')]
ECON --> [('Alpaydin Dilay', '1.2'), ('Gündogdu Ata Alp', '4.0'), ('Koca Aysu', '2.5'), ('Var Berna', '2.9')]
IR --> [('Atil Turgut Uluç', '2.1'), ('Hammoud Rawan', '1.7'), ('Ince Kemal Kahriman', '2.0'), ('Kaptan Deniz', '3.5'), ('Kestir Bengisu', '3.8'), ('Kolayli Sena Göksu', '2.8'), ('Naghiyeva Gulustan', '3.8'), ('Ok Arda Mert', '3.2')]
Why mess with rfind
and find
when you can simply split
?
def read_student(ifile):
D = {}
f1 = open(ifile,'r')
for line in f1:
cols = line.split() # Splits at one or more whitespace
surname = cols[0].strip()
department = cols[-2].strip() # Because you know the last-but-one is dept
gpa = float(cols[-1].strip()) # Because you know the last one is GPA
fname = ' '.join(cols[1:-2]).strip()
# cols[1:-2] gives you everything starting at col 1 up to but excluding the second-last.
# Then you join these with spaces.
if department not in D:
D[department] = [(surname, gpa)]
else:
D[department].append((surname, gpa))
f1.close()
return D
If you know that your columns are separated by \\t
always, you can do cols = line.split('\\t')
instead. Then you have the students' fname in the second column, the department in the third, and the GPA in the fourth.
A couple of suggestions:
defaultdict
to avoid checking if department not in D
every timewith
to manage reading the file so you don't have to worry about f1.close()
. This is the preferred way to read files in Python.You can use split(' ', 1)
to extract surname. It gives list with two elements. first one is surname. Then again split the second elements to get the using rsplit(' ', 1)
. It again gives list with two element first one is name and dept and second one is gpa. Again split second element to get department.
def read_student(ifile):
d = {}
with open(ifile) as fp:
for line in fp:
fname, data = line.strip().split(' ', 1)
data, gpa = data.rsplit(' ', 1)
dept = data.split()[-1]
d.setdefault(dept, []).append((fname, gpa))
return d
print(read_student('student.txt'))
Output:
{'ECON': [('Alpaydin', '1.2'),
('Gündogdu', '4.0'),
('Koca', '2.5'),
('Var', '2.9')],
'IR': [('Atil', '2.1'),
('Hammoud', '1.7'),
('Ince', '2.0'),
('Kaptan', '3.5'),
('Kestir', '3.8'),
('Kolayli', '2.8'),
('Naghiyeva', '3.8'),
('Ok', '3.2')],
'POLS': [('Aksel', '2.78'),
('Erserçe', '3.0'),
('Gülle', '2.7'),
('Gungor', '3.1'),
('Has', '1.97')],
'PSYC': [('Akçam', '3.9'),
('Deveci', '2.9'),
('Kumman', '2.9'),
('Madenoglu', '3.1'),
('Yeltekin', '1.2')]}
This solution makes use of itemgetter to simplify the getting of variables: surname, dept. and gpa
from operator import itemgetter
d = dict()
with open('f0.txt', 'r') as f:
for line in f:
name, dept, gpa = itemgetter(0, -2, -1)(line.split())
d.setdefault(dept, []).append((name, gpa))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.