读取以\\ n分隔的python文件，但忽略最后一个\\ n

Question

我有一个名为list.txt的文件，看起来像这样：

input1
input2
input3

我确定最后一行（input3）之后没有空白行。 然后，我有一个Python脚本，它将逐行读取此文件，并将文本写入更多文本以创建3个文件（每行一个）：

import os
os.chdir("/Users/user/Desktop/Folder")

with open('list.txt','r') as f:
    lines = f.read().split('\n')

    #for l in lines:
        header = "#!/bin/bash \n#BSUB -J %s.sh \n#BSUB -o /scratch/DBC/user/%s.sh.out \n#BSUB -e /scratch/DBC/user/%s.sh.err \n#BSUB -n 1 \n#BSUB -q normal \n#BSUB -P DBCDOBZAK \n#BSUB -W 168:00\n"%(l,l,l)
        script = "cd /scratch/DBC/user\n"
        script2 = 'grep "input" %s > result.%s.txt\n'%(l,l)
        all= "\n".join([header,script,script2])

        with open('script_{}.sh'.format(l), 'w') as output:
            output.write(all)

我的问题是，这将创建4个文件，而不是3个：script_input1.sh，script_input.sh，script_input3.sh和script_.sh。 最后一个文件没有文本，其他文件将具有input1或input2或input3。

似乎Python逐行读取了我的list.txt，但是当到达“ input3”时，它以某种方式继续吗？ 如何告诉Python逐行读取文件，并用“ \\ n”分隔，但在最后一个文本之后停止显示？

Answer 1

首先，不要在没有足够时间的情况下将整个文件读入内存-文件是可迭代的，因此逐行读取文件的正确方法是：

with open("/path/to/file.ext") as f:
    for line in f:
        do_something_with(line)

现在在您的for循环中，您只需要剥离该行，如果它为空，则忽略它：

with open("/path/to/file.ext") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        do_something_with(line)

稍微不相关，但是Python具有多行字符串，因此您也不需要串联：

# not sure I got it right actually ;)
script_tpl = """
#!/bin/bash 
#BSUB -J {line}.sh 
#BSUB -o /scratch/DBC/user/{line}.sh.out 
#BSUB -e /scratch/DBC/user/{line}.sh.err 
#BSUB -n 1 
#BSUB -q normal 
#BSUB -P DBCDOBZAK 
#BSUB -W 168:00
cd /scratch/DBC/user
grep "input" {line} > result.{line}.txt
"""

with open("/path/to/file.ext") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        script = script_tpl.format(line=line)
        with open('script_{}.sh'.format(line), 'w') as output:
            output.write(script)

最后一点：避免在脚本中更改目录，而是使用os.path.join()来处理绝对路径。

Answer 2

使用当前方法，您将需要：

检查lines的最后一个元素是否为空（ lines[-1] == '' ）
如果是这样，则将其丢弃（ lines = lines[:-1] ）。

with open('list.txt','r') as f:
    lines = f.read().split('\n')

if lines[-1] == '':
    lines = lines[:-1]

for line in lines:    
    print(line)

不要忘记，文件不以换行符结尾（末尾有空行）是合法的……这将解决这种情况。

另外，正如@setsquare指出的那样，您可能想尝试使用readlines() ：

with open('list.txt', 'r') as f:
    lines = [ line.rstrip('\n') for line in f.readlines() ]

for line in lines:
    print(line)

Answer 3

您是否考虑过使用readlines（）代替read（）？ 这将使Python为您处理最后一行是否带有\\ n的问题。

请记住，如果输入文件的最后一行确实有\\ n，则使用read（）并按'\\ n'拆分将创建一个额外的值。 例如：

my_string = 'one\ntwo\nthree\n'
my_list = my_string.split('\n')
print my_list
# >> ['one', 'two', 'three', '']

潜在的解决方案

lines = f.readlines()
# remove newlines
lines = [line.strip() for line in lines]
# remove any empty values, just in case
lines = filter(bool, lines)

对于一个简单的示例，请参见此处：如何将文件逐行读入列表？

Answer 4

f.read()返回一个与换行，从而结束的字符串split忠实地把作为从空串组分离的最后一行。 目前尚不清楚为什么要将整个文件显式读取到内存中。 只需遍历文件对象并使其处理换行即可。

with open('list.txt','r') as f:
    for l in f:
        # ...

Answer 5

我认为您使用的分割错误。

如果您具有以下条件：

text = 'xxx yyy'
text.split(' ') # or simply text.split()

结果将是

['xxx', 'yyy']

现在，如果您有：

text = 'xxx yyy ' # extra space at the end
text.split()

结果将是

['xxx', 'yyy', '']

，因为split获取每个''（空格）之前和之后的内容。 在这种情况下，最后一个空格之后是空字符串。

您可能使用的一些功能：

strip([chars]) # This removes all chars at the beggining or end of a string

例：

text = '___text_about_something___'
text.strip('_')

结果将是：

'text_about_something'

在您的特定问题中，您可以简单地：

lines = f.readlines() # read all lines of the file without '\n'
for l in lines:
    l.strip(' ') # remove extra spaces at the start or end of line if you need

读取以\\ n分隔的python文件，但忽略最后一个\\ n

问题描述

5 个解决方案

解决方案1
3 已采纳 2017-10-11 15:01:41

解决方案2
1 2017-10-11 14:47:38

解决方案3
1 2017-10-11 14:52:44

解决方案4
1 2017-10-11 15:08:37

解决方案5
0 2017-10-11 15:07:52

读取以\\ n分隔的python文件，但忽略最后一个\\ n

问题描述

5 个解决方案

解决方案1 3 已采纳 2017-10-11 15:01:41

解决方案2 1 2017-10-11 14:47:38

解决方案3 1 2017-10-11 14:52:44

解决方案4 1 2017-10-11 15:08:37

解决方案5 0 2017-10-11 15:07:52

解决方案1
3 已采纳 2017-10-11 15:01:41

解决方案2
1 2017-10-11 14:47:38

解决方案3
1 2017-10-11 14:52:44

解决方案4
1 2017-10-11 15:08:37

解决方案5
0 2017-10-11 15:07:52