从文件中读取时剥离空格和新行

Question

我有以下代码，在从文件读取时成功地删除行尾字符，但是对于任何前导和尾随空格都没有这样做（我希望中间的空格被留下！）

实现这一目标的最佳方法是什么？ （注意，这是一个具体的例子，因此不能删除剥离字符串的一般方法）

我的代码 :(尝试使用测试数据 ：“ Moose先生 ”（未找到），如果你尝试“Moose先生” （这是Moose之后的空间），它将起作用。

#A COMMON ERROR is leaving in blank spaces and then finding you cannot work with the data in the way you want!

"""Try the following program with the input: Mr Moose
...it doesn't work..........
but if you try "Mr Moose " (that is a space after Moose..."), it will work!
So how to remove both new lines AND leading and trailing spaces when reading from a file into a list. Note, the middle spaces between words must remain?
"""

alldata=[]
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
      for line in f.readlines():
            alldata.append((line.strip()))
      print(alldata)


      print()
      print()

      for x in alldata: 
             teacher_names.append(x.split(delimiter)[col_num]) 

      teacher=input("Enter teacher you are looking for:")
      if teacher in teacher_names: 
            print("found")
      else:
            print("No")

生成列表alldata时所需的输出

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

ie - 删除开头处以及分隔符之前或之后的所有前导和尾随空格。 必须留下像穆斯先生这样的词之间的空间。

教师内容：

Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English

提前致谢

Answer 1

你可以使用正则表达式：

txt='''\
Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English'''

>>> [re.sub(r'\s*:\s*', ':', line).strip() for line in txt.splitlines()]
['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

所以你的代码变成：

import re
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
    alldata=[re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip() for line in f]
    print(alldata)

    for x in alldata: 
         teacher_names.append(x.split(delimiter)[col_num]) 
    print(teacher_names)

打印：

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']
['Mr Moose', 'Mr Goose', 'Mrs Congenelipilling']

关键部分是正则表达式：

re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip()

          ^                          0 to unlimited spaced before the delimiter
            ^                        place for the delimiter
              ^                      unlimited trailing space

互动演示

对于所有Python解决方案，我将使用str.partition获取分隔符的左侧和右侧，然后根据需要删除空格：

alldata=[]    
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        alldata.append(lh.rstrip() + sep + rh.lstrip())

相同的输出

另一个建议。 您的数据更适合dict不是列表。

你可以做：

di={}
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        di[lh.rstrip()]=rh.lstrip()

或理解版本：

with open("teacherbook.txt") as f:
    di={lh.rstrip():rh.lstrip() 
          for lh,_,rh in (line.rstrip().partition(delimiter) for line in f)}

然后像这样访问：

>>> di['Mr Moose']
'Maths'

Answer 2

不需要使用readlines() ，您可以简单地遍历文件对象以获取每一行，并使用strip()来删除\\n和空格。 因此，您可以使用此列表理解;

with open('teacherbook.txt') as f:
    alldata = [':'.join([value.strip() for value in line.split(':')]) 
               for line in f]
    print(alldata)

输出;

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

Answer 3

更改：

teacher_names.append(x.split(delimiter)[col_num])

至：

teacher_names.append(x.split(delimiter)[col_num].strip())

Answer 4

删除开头处以及分隔符之前或之后的所有前导和尾随空格。 必须留下像穆斯先生这样的词之间的空间。

您可以在分隔符处拆分字符串，从中删除空格，然后将它们重新连接在一起：

for line in f.readlines():
    new_line = ':'.join([s.strip() for s in line.split(':')])
    alldata.append(new_line)

示例：

>>> lines = ['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> lines
['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> data = []
>>> for line in lines:
    new_line = ':'.join([s.strip() for s in line.split(':')])
    data.append(new_line)


>>> data
['Mr Moose:Maths', 'Mr Goose:History']

Answer 5

您可以使用regex轻松完成 - re.sub：

import re

re.sub(r"[\n \t]+$", "", "aaa \t asd \n ")
Out[17]: 'aaa \t asd'

第一个参数模式 - [要删除的所有字符]+ + - 一个或多个匹配$ $ - 字符串的结尾

https://docs.python.org/2/library/re.html

Answer 6

使用string.rstrip（'something'）你可以从字符串的右端删除'something'，如下所示：

a = 'Mr Moose \n'

print a.rstrip(' \n') # prints 'Mr Moose\n' instead of 'Mr Moose \n\n'

从文件中读取时剥离空格和新行

问题描述

6 个解决方案

解决方案1
7 已采纳 2017-07-17 14:15:52

解决方案2
3 2017-07-17 14:15:36

解决方案3
2 2017-07-17 14:07:49

解决方案4
2 2017-07-17 14:08:54

解决方案5
1 2017-07-17 14:16:29

解决方案6
-2 2017-07-17 14:03:33

从文件中读取时剥离空格和新行

问题描述

6 个解决方案

解决方案1 7 已采纳 2017-07-17 14:15:52

解决方案2 3 2017-07-17 14:15:36

解决方案3 2 2017-07-17 14:07:49

解决方案4 2 2017-07-17 14:08:54

解决方案5 1 2017-07-17 14:16:29

解决方案6 -2 2017-07-17 14:03:33

解决方案1
7 已采纳 2017-07-17 14:15:52

解决方案2
3 2017-07-17 14:15:36

解决方案3
2 2017-07-17 14:07:49

解决方案4
2 2017-07-17 14:08:54

解决方案5
1 2017-07-17 14:16:29

解决方案6
-2 2017-07-17 14:03:33