简体   繁体   中英

strip white spaces and new lines when reading from file

I have the following code, that successfully strips end of line characters when reading from file, but doesn't do so for any leading and trailing white spaces (I want the spaces in between to be left!)

What is the best way to achieve this? (Note, this is a specific example, so not a duplicate of general methods to strip strings)

My code : (try it with the test data : " Mr Moose " (not found) and if you try "Mr Moose " (that is a space after the Moose) it will work.

#A COMMON ERROR is leaving in blank spaces and then finding you cannot work with the data in the way you want!

"""Try the following program with the input: Mr Moose
...it doesn't work..........
but if you try "Mr Moose " (that is a space after Moose..."), it will work!
So how to remove both new lines AND leading and trailing spaces when reading from a file into a list. Note, the middle spaces between words must remain?
"""

alldata=[]
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
      for line in f.readlines():
            alldata.append((line.strip()))
      print(alldata)


      print()
      print()

      for x in alldata: 
             teacher_names.append(x.split(delimiter)[col_num]) 

      teacher=input("Enter teacher you are looking for:")
      if teacher in teacher_names: 
            print("found")
      else:
            print("No")

Desired output , on producing the list alldata

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

ie - remove all leading and trailing white space at the start, and before or after the delimiter. The spaces in between words such as Mr Moose, must be left.

Contents of teacherbook:

Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English

Thanks in advance

You could use a regex:

txt='''\
Mr Moose : Maths
Mr Goose: History
Mrs Congenelipilling: English'''

>>> [re.sub(r'\s*:\s*', ':', line).strip() for line in txt.splitlines()]
['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

So your code becomes:

import re
col_num=0
teacher_names=[]
delimiter=":"

with open("teacherbook.txt") as f:
    alldata=[re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip() for line in f]
    print(alldata)

    for x in alldata: 
         teacher_names.append(x.split(delimiter)[col_num]) 
    print(teacher_names)  

Prints:

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']
['Mr Moose', 'Mr Goose', 'Mrs Congenelipilling']

The key part is the regex:

re.sub(r'\s*{}\s*'.format(delimiter), delimiter, line).rstrip()

          ^                          0 to unlimited spaced before the delimiter
            ^                        place for the delimiter
              ^                      unlimited trailing space

Interactive Demo


For an all Python solution, I would use str.partition to get the left hand and right hand side of the delimiter then strip the whitespace as needed:

alldata=[]    
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        alldata.append(lh.rstrip() + sep + rh.lstrip())

Same output


Another suggestion. Your data is more suited to a dict than a list.

You can do:

di={}
with open("teacherbook.txt") as f:
    for line in f:
        lh,sep,rh=line.rstrip().partition(delimiter)
        di[lh.rstrip()]=rh.lstrip()

Or comprehension version:

with open("teacherbook.txt") as f:
    di={lh.rstrip():rh.lstrip() 
          for lh,_,rh in (line.rstrip().partition(delimiter) for line in f)}

Then access like this:

>>> di['Mr Moose']
'Maths'

No need to use readlines() , you can simply iterate through the file object to get each line, and use strip() to remove the \\n and whitespaces. As such, you can use this list comprehension;

with open('teacherbook.txt') as f:
    alldata = [':'.join([value.strip() for value in line.split(':')]) 
               for line in f]
    print(alldata)

Outputs;

['Mr Moose:Maths', 'Mr Goose:History', 'Mrs Congenelipilling:English']

Change:

teacher_names.append(x.split(delimiter)[col_num])

to:

teacher_names.append(x.split(delimiter)[col_num].strip())

remove all leading and trailing white space at the start, and before or after the delimiter. The spaces in between words such as Mr Moose, must be left.

You can split your string at the delimiter, strip the whitespace from them, and concatenate them back together again:

for line in f.readlines():
    new_line = ':'.join([s.strip() for s in line.split(':')])
    alldata.append(new_line)

Example :

>>> lines = ['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> lines
['  Mr Moose :   Maths', ' Mr Goose :  History  ']
>>> data = []
>>> for line in lines:
    new_line = ':'.join([s.strip() for s in line.split(':')])
    data.append(new_line)


>>> data
['Mr Moose:Maths', 'Mr Goose:History']

You can do it easily with regex - re.sub:

import re

re.sub(r"[\n \t]+$", "", "aaa \t asd \n ")
Out[17]: 'aaa \t asd'

first argument pattern - [ all characters you want to remove ]+ + - one or more matches $ $ - end of the string

https://docs.python.org/2/library/re.html

With string.rstrip('something') you can remove that 'something' from the right end of the string like this:

a = 'Mr Moose \n'

print a.rstrip(' \n') # prints 'Mr Moose\n' instead of 'Mr Moose \n\n'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM