简体   繁体   English

如何在不使用.readlines()的情况下读取.txt文件/如何用\\ n替换UTF-8换行符?

[英]How to read .txt file without .readlines() / replace UTF-8 newline character with \n?

I have some AI-generated nonsense in a .txt file that looks like this: 我在.txt文件中有一些AI生成的废话,如下所示:

MENENIUS:
I have been they prayers of the reason,
And away to friends than the state pointer;
The words that shall can virtue to your head.

I have some Python code (using web.py ) that looks like this: 我有一些Python代码(使用web.py ),如下所示:

class index(object):
    def GET(self):
        text = open("menenius.txt", "r").read() 
        return render.index(text)

When I view it in localhost, it looks like this: 当我在localhost中查看它时,它看起来像这样:

MENENIUS: I have been they prayers of the reason, And away to friends than the state pointer; The words that shall can virtue to your head.

Menenius' little speech is actually just one clipping of a much larger .txt file, so I don't want to use .readlines() as going over the list will be memory-intensive. Menenius的简短讲话实际上只是一个更大的.txt文件的一个剪辑,所以我不想使用.readlines() ,因为遍历列表将占用大量内存。 If that weren't an issue, in a normal script I'd be able to just print the list that .readlines() generates, but the fact that I'm using web.py and need to get this into render.index() complicates things. 如果这不是问题,则可以在普通脚本中仅打印.readlines()生成的列表,但事实是我使用的是web.py,需要将其放入render.index()使事情复杂化。

What I've Tried 我尝试过的

My first thought was to use the .replace() method in the script that generates menenius.txt to replace every instance of the invisible UTF-8 newline character with \\n . 我的第一个想法是在生成menenius.txt的脚本中使用.replace()方法,用\\n替换不可见UTF-8换行符的每个实例。 Since .read() gives you the entire .txt file as a single string, I thought that would work but doing this: 由于.read()将整个.txt文件作为单个字符串提供给您,因此我认为这样做可以,但是可以这样做:

from_text = open("menenius.txt", "r").read()
from_text.replace(0x0A, "\n")

Gets me this error, referring to the line with .replace() : 让我得到这个错误,指的是.replace()

TypeError: expected a character buffer object

Which I've googled, but none of it seems very applicable or very clear. 我已经用谷歌搜索过,但是似乎没有一个非常适用或非常清楚。 I'm just starting out with Python and I've been going around in circles with this for a couple of hours, so I feel like there's something really obvious here that I don't know about. 我刚开始使用Python,并且已经进行了几个小时的交流,所以我觉得这里确实有一个我不知道的东西。


As I mentioned I've also tried returning the list that .readlines() generates, but that's going to get memory-intensive and I'm not sure how to fit that output into render.index() anyway. 正如我提到的,我还尝试过返回.readlines()生成的列表,但这会占用大量内存,而且我不确定如何将输出适合render.index()

Edit: The Solution 编辑:解决方案

So the answer below works, but after I made that change I was still having the same issue. 因此,下面的答案有效,但是在做出更改后,我仍然遇到相同的问题。 ShadowRanger's "I'm assuming your renderer is sending out HTML" got me thinking, and I opened up localhost and got into the web inspector to see that all the text was in quotation marks within its p tags, like so: ShadowRanger的“我假设您的渲染器正在发送HTML”让我开始思考,然后打开localhost并进入Web检查器,以查看所有文本都在其p标签中用引号引起来,如下所示:

<p>
"MENENIUS: I have been they prayers of the reason, And away to friends than the state pointer; The words that shall can virtue to your head."
</p>

I came back to this after a few hours having realised something. 几个小时后,我意识到了这一点。 In the index.html file the content was being sent to, it looked like this: 在将内容发送到index.html文件中,它看起来像这样:

<p>
$content
</p>

I had a suspicion, checked the web.py intro tutorial again and found this: 我怀疑,再次检查了web.py入门教程 ,发现了这一点:

As you can see, the templates look a lot like Python files except for the def with statement at the top (saying what the template gets called with) and the $s placed in front of any code. 如您所见,模板看起来很像Python文件,除了顶部的def with语句(说出模板被调用的内容)和$ s放在任何代码的前面。 Currently, template.py requires the $def statement to be the first line of the file. 当前,template.py要求$ def语句是文件的第一行。 Also, note that web.py automatically escapes any variables used here, so that if for some reason name is set to a value containing some HTML, it will get properly escaped and appear as plain text. 另外,请注意,web.py会自动转义此处使用的所有变量,因此,如果出于某种原因将name设置为包含HTML的值,它将被正确转义并显示为纯文本。 If you want to turn this off, write $:name instead of $name. 如果要关闭此功能,请写$:name而不是$ name。

I changed $content to $:content , and suddenly the text is being rendered as HTML rather than as a string. 我将$content更改$content $:content ,突然文本被呈现为HTML而不是字符串。

Your file already contains newlines ( '\\x0a' is an escape for the exact same character that '\\n' produces). 您的文件已经包含换行符( '\\x0a''\\n'产生的相同字符的转义符)。 I'm assuming your renderer is sending out HTML though, and HTML doesn't care about newlines in the text (outside of pre blocks, and other blocks styled similarly). 我假设您的渲染器正在发送HTML,但是HTML不在乎文本中的换行符(除了pre块,以及其他样式类似的块)。

So either wrap the data in a pre block, or replace the '\\n' with <br> tags (which are how HTML says "No, really, I want a line break"), eg: 因此,要么将数据包装在pre块中,要么将< '\\n'替换为< <br>标记(HTML表示“不,真的,我想换行”),例如:

from_text = from_text.replace("\n", "<br>\n")

Leaving in the newlines may be handy to people viewing the source, so I replaced with both the <br> tag and a newline (Python won't replace in a replacement, so don't worry about infinite replacement just because the newline was part of the replacement). 留着换行符可能对查看源代码的人很方便,因此我同时用<br>标记和换行符替换了(Python不会替换掉替换符,所以不必担心无限替换,因为换行符是一部分更换)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM