简体   繁体   English

Python:如何正确使用readline()和readlines()

[英]Python: How to properly use readline() and readlines()

I've build a Python script to randomly create sentences using data from the Princeton English Wordnet, following diagrams provided by Gödel, Escher, Bach . 我已经按照Gödel,Escher和Bach提供的图表,构建了一个Python脚本,使用来自Princeton English Wordnet的数据随机创建句子。 Calling python GEB.py produces a list of nonsensical sentences in English, such as: 调用python GEB.py会产生英语无意义的句子列表,例如:

resurgent inaesthetic cost. 复苏的麻醉费用。 the bryophytic fingernail. 苔藓植物指甲。 aversive fortieth peach. 第四十号桃子。 the asterismal hide. 星空皮。 the flour who translate gown which take_a_dare a punch through applewood whom the renewed request enfeoff. 经过重新包装的要求使翻译成a_d_d的长袍的面粉穿透了苹果树。 an lobeliaceous freighter beside tuna. 在金枪鱼旁边的一具小叶货轮。

And saves them to gibberish.txt. 并将它们保存到gibberish.txt。 This script works fine. 这个脚本工作正常。

Another script ( translator.py ) takes gibberish.txt and, through py-googletrans Python module, tries to translate those random sentences to Portuguese: 另一个脚本( translator.py )使用gibberish.txt,并通过py-googletrans Python模块尝试将这些随机句子翻译成葡萄牙语:

from googletrans import Translator
import json

tradutor = Translator()

with open('data.json') as dataFile:
    data = json.load(dataFile)


def buscaLocal(keyword):
    if keyword in data:
        print(keyword + data[keyword])
    else:
        buscaAPI(keyword)


def buscaAPI(keyword):
    result = tradutor.translate(keyword, dest="pt")
    data.update({keyword: result.text})

    with open('data.json', 'w') as fp:
        json.dump(data, fp)

    print(keyword + result.text)


keyword = open('/home/user/gibberish.txt', 'r').readline()
buscaLocal(keyword)

Currently the second script outputs only the translation of the first sentence in gibberish.txt. 当前,第二个脚本仅输出gibberish.txt中第一句的翻译。 Something like: 就像是:

resurgent inaesthetic cost. 复苏的麻醉费用。 aumento de custos inestético. aumento de custosinestético。

I have tried to use readlines() instead of readline() , but I get the following error: 我尝试使用readlines()代替readline() ,但是出现以下错误:

Traceback (most recent call last):
  File "main.py", line 28, in <module>
    buscaLocal(keyword)
  File "main.py", line 11, in buscaLocal
    if keyword in data:
TypeError: unhashable type: 'list'

I've read similar questions about this error here, but it is not clear to me what should I use in order to read the whole list of sentences contained in gibberish.txt (new sentences begin in a new line). 我在这里已经阅读了有关此错误的类似问题,但是我不清楚我应该使用什么来读取gibberish.txt中包含的整个句子列表(新句子从新行开始)。

How can I read the whole list of sentences contained in gibberish.txt? 如何阅读gibberish.txt中包含的整个句子列表? How should I adapt the code in translator.py in order to achieve that? 我应该如何修改translator.py中的代码以实现该目标? I am sorry if the question is a bit confuse, I can edit if necessary, I am a Python newbie and I would appreciate if someone could help me out. 很抱歉,如果这个问题有点令人困惑,我可以根据需要进行编辑,我是Python新手,如果有人可以帮助我,我将不胜感激。

If you are using readline() function, you have to remember that this function only returns a line, so you have to use a loop to go through all of the lines in the text files. 如果使用readline()函数,则必须记住该函数仅返回一行,因此必须使用循环来遍历文本文件中的所有行。 In case of using readlines() , this function does reads the full file at once, but return each of the lines in a list. 如果使用readlines() ,则此函数会一次读取完整文件,但返回列表中的每一行。 List data type is unhashable and can not be used as key in a dict object, that's why if keyword in data: line emits this error, as keyword here is a list of all of the lines. 列表数据类型不可散列,并且不能用作dict对象中的键,这就是为什么if keyword in data: line会发出此错误的原因,因为此处的keyword是所有行的列表。 a simple for loop will solve this problem. 一个简单的for循环将解决此问题。

text_lines = open('/home/user/gibberish.txt', 'r').readlines()
for line in text_lines:
     buscaLocal(line)

This loop will iterate through all of the lines in the list and there will be error accessing the dict as key element will be a string. 此循环将遍历列表中的所有行,并且访问dict会出错,因为key元素将是一个字符串。

Let's start with what you're doing to the file object. 让我们从您对文件对象所做的事情开始。 You open a file, get a single line from it, and then don't close it. 您打开一个文件,从中获得一行,然后再关闭它。 A better way to do it would be to process the entire file and then close it. 更好的方法是处理整个文件,然后将其关闭。 This is generally done with a with block, which will close the file even if an error occurs: 通常使用with块完成此操作,即使发生错误,该块也会关闭文件:

with open('gibberish.txt') as f:
    # do stuff to f

Aside from the material benefits, this will make the interface clearer, since f is no longer a throwaway object. 除了物质上的好处外,这将使界面更清晰,因为f不再是一次性的对象。 You have three easy options for processing the entire file: 您可以使用三个简单的选项来处理整个文件:

  1. Use readline in a loop since it will only read one line at a time. 在循环中使用readline ,因为它一次只能读取一行。 You will have to strip off the newline characters manually and terminate the loop when '' appears: 您将必须手动剥离换行符,并在出现''时终止循环:

     while True: line = f.readline() if not line: break keyword = line.rstrip() buscaLocal(keyword) 

    This loop can take many forms, one of which is shown here. 此循环可以采用多种形式,此处显示其中一种形式。

  2. Use readlines to read in all the lines in the file at once into a list of strings: 使用readlines一次将文件中的所有行读入字符串列表:

     for line in f.readlines(): keyword = line.rstrip() buscaLocal(keyword) 

    This is much cleaner than the previous option, since you don't need to check for loop termination manually, but it has the disadvantage of loading the entire file all at once, which the readline loop does not. 这比以前的选项要干净得多,因为您不需要手动检查循环终止,但是它的缺点是一次加载整个文件,而readline循环则没有。

    This brings us to the third option. 这将我们带到第三个选项。

  3. Python files are iterable objects. Python文件是可迭代的对象。 You can have the cleanliness of the readlines approach with the memory savings of readline : 您可以通过节省readline来保持readlines方法的整洁度:

     for line in f: buscaLocal(line.rstrip()) 

    this approach can be simulated using readline with the more arcane form of next to create a similar iterator: 可以使用readlinenext的更多神秘形式来模拟这种方法,以创建类似的迭代器:

     for line in next(f.readline, ''): buscaLocal(line.rstrip()) 

As a side point, I would make some modifications to your functions: 附带说明一下,我将对您的功能进行一些修改:

def buscaLocal(keyword):
    if keyword not in data:
        buscaAPI(keyword)
    print(keyword + data[keyword])

def buscaAPI(keyword):
    # Make your function do one thing. In this case, do a lookup.
    # Printing is not the task for this function.
    result = tradutor.translate(keyword, dest="pt")
    # No need to do a complicated update with a whole new
    # dict object when you can do a simple assignment.
    data[keyword] = result.text

...

# Avoid rewriting the file every time you get a new word.
# Do it once at the very end.
with open('data.json', 'w') as fp:
    json.dump(data, fp)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM