简体   繁体   English

f.read()在两行之间不读取

[英]f.read() does not read between lines

I use Python 3.6. 我使用Python 3.6。 I have some strings I want to check in a read.txt file. 我有一些字符串要在read.txt文件中检入。 The problem is that the .txt file is written such that sentences may be cut and put into a different line. 问题在于,.txt文件的编写方式使句子可能会被剪切并放入另一行。 For example: 例如:

bla bla bla internal control over financial reporting or an attestation
report of our auditors

The .txt file cuts the sentence after the word "attestation" and starts with "report" in the following line. .txt文件在单词“证明”之后剪短句子,并在下一行中以“报告”开头。 I want to look for the entire sentence in the file, irrespective of what line it is (and create var1=1 if the sentence is in the file, and 0 otherwise). 我想在文件中查找整个句子,而不管它在哪行(如果句子在文件中,则创建var1 = 1,否则创建0)。

I use the following code to parse (and it seems I don't know how to specify that I don't bother about lines): 我使用以下代码进行解析(似乎我不知道如何指定自己不打扰行):

string1 = 'internal control over financial reporting or an attestation report of our auditors'    
exemptions = []
for eachfile in file_list: #I have many .txt files in my directory
        with open(eachfile, 'r+', encoding='utf-8') as f:
            line2 = f.read()  # line2 should be a var with all the .txt file
            var1 = re.findall(str1, line2, re.I)  # find str1 in line2
            if len(re.findall(str1, line2, re.I)) > 0:
                exemptions.append('1')  # if it detects smthg, then append exemptions list with var1=1
            else:
                exemptions.append('0')  # otherwise var1= 0

Any idea of how to do that? 关于如何做到这一点的任何想法? I thought that by using the line2=f.read(), I was actually checking the whole .txt file, irrespective of lines, but it does not seem so.... 我以为通过使用line2 = f.read(),我实际上是在检查整个.txt文件,而与行无关,但这似乎并非如此。

Thank you anyways! 反正谢谢你!

You're assuming a newline is the same as a space - it's not. 您假设换行符与空格相同-事实并非如此。 Try changing 尝试改变

line2 = f.read()

to

line2 = f.read().replace('\n', ' ').replace('\r', ' ')

This should replace any newlines in the file with spaces, thus allowing your search to work as intended. 这应该用空格替换文件中的所有换行符,从而使搜索按预期进行。

You could similarly do 您可以类似地做

line2 = ' '.join(line.rstrip('\n') for line in f)

You could instead modify your regex: 您可以改为修改您的正则表达式:

var1 = re.findall(str1.replace(' ', '\s+'), line2, re.I)  # find str1 in line2
if var1:
    exemptions.append('1')
else:
    exemptions.append('0')

In regex terms, \\s is any spacing character, \\s+ allows for multiple spaces or newlines. 用正则表达式来说, \\s是任何空格字符, \\s+允许多个空格或换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM