简体   繁体   中英

remove spaces from the beginning of string from text file on python

I have a list like a bellow that need to split into prefix/root/suffix

Input
form
jalan
ba-jalan
pem-porut#an
daun #kulu
daun#kulu
tarik-napas
tarik#napas
n-cium #bow
arau/araw
imbaw//nimbaw
dengo | nengo
dodop=am
{di} dalam
di {dalam}

I have done it by bellow regex on python:

import sys
 sys.stdout = open('final.txt', 'w')

import re
 open('split.txt') as f:
  new_split = [item.strip() for item in f.readlines()]

for word in new_split:
 m = re.match(r"(?:\{[^-#={}/|]+\})?(?:([^-#={}/|]+)-)?([^-#={}/|]+)(?:/[^-#={}/|]+)?(?:[#=]([^-#={}/|]+))?", word)
if m:
    print("\t".join([str(item) for item in m.groups()]))
else:
    print("(no match: %s)" % word)

the output which is final looks like this.

None    jalan   None
ba  jalan   None
pem porut   an
None    daun    kulu
None    daun    kulu
tarik   napas   None
None    tarik   napas
n   cium    bow
None    arau    None
None    imbaw   None
None    dengo   None
None    dodop   am
None     dalam  None
None    di  None

now as you see in the word dalam at the bottom of the output file there is extra space before dalam and some other words also have extra space before strings how to remove those extra space from the final.txt file can I do it at the same above script or should I do that in the separate script? thanks.

Add lstrip() to the string to remove leading whitespaces.

str(item).lstrip()

Code:

import re
with open('split.txt') as w:
    new_split = [item.strip() for item in w.readlines()]


for word in new_split:
    m = re.match(r"(?:\{[^-#={}/|]+\})?(?:([^-#={}/|]+)-)?([^-#={}/|]+)(?:/[^-#={}/|]+)?(?:[#=]([^-#={}/|]+))?", word)
    if m:
        print("\t".join([str(item).lstrip() for item in m.groups()]))
    else:
        print("(no match: %s)" % word)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM