如何在Python中使用循环从txt文件中提取单词（第二行和第三行）

Question

我有几个txt文件，其中包含作者的名字和姓氏。 这是大约30个示例中的两个示例（ 不包含相同数量的作者 ）。

authors1.txt

AU  - Jordan, M. 
AU  - Thomson, J.J.  
AU  - Einstein, A.  
AU  - Tesla, N.

authors3.txt

AU  - Agassi, A.
AU  - Herbert, P.H.
AU  - Agut, R.B.

我想为每个文件提取作者的姓氏和名字。 由于我是Python的初学者，因此我编写了一个脚本（或多或少适合）。

with open('authors3.txt', 'rb') as f:
    textfile_temp = f.read()

#o_author1 
o_author1 = textfile_temp.split('AU  - ')[1]
L_name1  = o_author1.split(",")[0]
F_name1  = o_author1.split(",")[1]
print(L_name1)
print(F_name1)

#o_author2 
o_author2 = textfile_temp.split('AU  - ')[2]
L_name2  = o_author2.split(",")[0]
F_name2  = o_author2.split(",")[1]
print(L_name2)
print(F_name2)

#o_author3 
o_author3 = textfile_temp.split('AU  - ')[3]
L_name3  = o_author3.split(",")[0]
F_name3  = o_author3.split(",")[1]
print(L_name3)
print(F_name3)

我的结果是：

Agassi
 A.

Herbert
 P.H.

Agut
 R.B.

我的问题：知道文件authors#.txt ，每个文件包含的作者数量不一样，是否可以编写一个带有循环的脚本？

Answer 1

使用简单的for-loop

演示：

authors_firstName = []
authors_lastName = []
with open(filename, "r") as infile:
    for i in infile.readlines():
        val = i.strip().split("-")[-1].strip().split(",")   #str.strip to remove any leading or trailing space, split by "-"
        authors_firstName.append(val[0])
        authors_lastName.append(val[1])
print(authors_firstName)
print(authors_lastName)

输出：

['Jordan', 'Thomson', 'Einstein', 'Tesla', 'Agassi', 'Herbert', 'Agut']
[' M.', ' J.J.', ' A.', ' N.', ' A.', ' P.H.', ' R.B.']

Answer 2

我建议您逐行阅读文件，例如，

with open('authors1.txt', 'rb') as f:
    lines = f.readlines()

# lines = ["AU - Jordan, M.", "AU - Thomson, J.J.", "AU - Einstein, A.", "AU  - Tesla, N."]

for line in lines:
    o_author1 = line.split('AU  - ')[1]
    L_name1  = o_author1.split(",")[0]
    F_name1  = o_author1.split(",")[1]
    print(L_name1)
    print(F_name1)

Jordan
 M.
Thomson
 J.J.
Einstein
 A.
Tesla
 N.

Answer 3

您可以使用os.listdir()或os.walk()来获取当前（或任何其他）目录中的文件。 在获得作者文本文件列表之后，您可以使用简单的for循环在它们之间循环。

提示：对文件对象进行循环将一次产生一行，直到到达文件末尾-这也是内存有效的，因为它一次只读取一行到内存，而不是加载整个文件内容保存到您的记忆中。

如果您将您的作者姓名提取为函数，则可以将代码简化为以下形式：

import os

def get_author(line):
    name = line.strip().split('AU  - ')[1]
    firstname, lastname = name.split(',')
    return firstname, lastname

if __name__ == '__main__':
    files = [f for f in os.listdir('.') if os.path.isfile(f)]
    # You probably want a more fancy way of detecting author files
    files = [f for f in files if f.startswith('authors') and f.endswith('.txt')]

    authors = []
    for file in files:
        with open(file, 'r') as fd:
            for line in fd:
                authors.append(get_author(line))
    print(authors)

脚本末尾的authors将是一个包含元组的列表-每个元组都由您的作者的名字和姓氏组成。

Answer 4

我对Python有点粗略，所以给您一些伪代码：

lines = file.ReadAll()

for line in lines
    parts = line.split("-,")
    print parts[1], parts[2]

就是这样。 将整个文件读入一个变量，遍历每一行并提取各部分。

或者，基本上执行@Rakesh建议的操作=）

如何在Python中使用循环从txt文件中提取单词（第二行和第三行）

问题描述

4 个解决方案

解决方案1
3 2018-05-18 07:54:42

解决方案2
1 2018-05-18 07:56:11

解决方案3
1 已采纳 2018-05-18 07:58:41

解决方案4
0 2018-05-18 07:55:51

如何在Python中使用循环从txt文件中提取单词（第二行和第三行）

问题描述

4 个解决方案

解决方案1 3 2018-05-18 07:54:42

解决方案2 1 2018-05-18 07:56:11

解决方案3 1 已采纳 2018-05-18 07:58:41

解决方案4 0 2018-05-18 07:55:51

解决方案1
3 2018-05-18 07:54:42

解决方案2
1 2018-05-18 07:56:11

解决方案3
1 已采纳 2018-05-18 07:58:41

解决方案4
0 2018-05-18 07:55:51