I have several txt
files, that contain first and last name of the authors. Here are two examples among about thirty ( that do not contain the same number of authors ).
authors1.txt
AU - Jordan, M.
AU - Thomson, J.J.
AU - Einstein, A.
AU - Tesla, N.
authors3.txt
AU - Agassi, A.
AU - Herbert, P.H.
AU - Agut, R.B.
I want to extract the last and first name of the authors for each file. Since I am a beginner in Python, I wrote a script (more or less suitable).
with open('authors3.txt', 'rb') as f:
textfile_temp = f.read()
#o_author1
o_author1 = textfile_temp.split('AU - ')[1]
L_name1 = o_author1.split(",")[0]
F_name1 = o_author1.split(",")[1]
print(L_name1)
print(F_name1)
#o_author2
o_author2 = textfile_temp.split('AU - ')[2]
L_name2 = o_author2.split(",")[0]
F_name2 = o_author2.split(",")[1]
print(L_name2)
print(F_name2)
#o_author3
o_author3 = textfile_temp.split('AU - ')[3]
L_name3 = o_author3.split(",")[0]
F_name3 = o_author3.split(",")[1]
print(L_name3)
print(F_name3)
my result is:
Agassi
A.
Herbert
P.H.
Agut
R.B.
My question: Is it possible to write a script with a loop, knowing that the files authors#.txt
, don't each contain the same number of authors?
Using a simple for-loop
Demo:
authors_firstName = []
authors_lastName = []
with open(filename, "r") as infile:
for i in infile.readlines():
val = i.strip().split("-")[-1].strip().split(",") #str.strip to remove any leading or trailing space, split by "-"
authors_firstName.append(val[0])
authors_lastName.append(val[1])
print(authors_firstName)
print(authors_lastName)
Output:
['Jordan', 'Thomson', 'Einstein', 'Tesla', 'Agassi', 'Herbert', 'Agut']
[' M.', ' J.J.', ' A.', ' N.', ' A.', ' P.H.', ' R.B.']
I suggest you read your file line by line, let's say,
with open('authors1.txt', 'rb') as f:
lines = f.readlines()
# lines = ["AU - Jordan, M.", "AU - Thomson, J.J.", "AU - Einstein, A.", "AU - Tesla, N."]
for line in lines:
o_author1 = line.split('AU - ')[1]
L_name1 = o_author1.split(",")[0]
F_name1 = o_author1.split(",")[1]
print(L_name1)
print(F_name1)
Jordan
M.
Thomson
J.J.
Einstein
A.
Tesla
N.
You can fetch the files in your current (or any other) directory by using os.listdir()
or os.walk()
. After you've obtained a list of author text files, you can simply loop through them with a simple for loop.
Hint: for-looping over a file object will yield you one line at a time , until it reaches end of the file - this is also memory efficient, as it only reads one line at a time to memory, instead of loading the entire file contents to your memory.
If you abstract your author name getting to a function, you can then simplify your code to something like this:
import os
def get_author(line):
name = line.strip().split('AU - ')[1]
firstname, lastname = name.split(',')
return firstname, lastname
if __name__ == '__main__':
files = [f for f in os.listdir('.') if os.path.isfile(f)]
# You probably want a more fancy way of detecting author files
files = [f for f in files if f.startswith('authors') and f.endswith('.txt')]
authors = []
for file in files:
with open(file, 'r') as fd:
for line in fd:
authors.append(get_author(line))
print(authors)
authors
at the end of the script will be a list containing tuples - each tuple consisting of the first and last name of your author.
I'm a bit rough on my Python, so I'll give you some pseudocode:
lines = file.ReadAll()
for line in lines
parts = line.split("-,")
print parts[1], parts[2]
And that's it. Read the entire file into a variable, iterate over each line and extract the parts.
Or, basically do what @Rakesh suggested =)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.