I am trying to write a script in Python which "cleans" a number of file-fed text like this:
for i in range(1,10):
number = 1
cleanText = re.sub('number.','',line).strip()
number = number + 1
print cleanText
An example file would be: 1. Hello, World 2. Hello earth
What I need to do here is remove the numbering and the dots along with leading blank spaces in one fell swoop. But how on earth can I first perform a simple variable expansion?
Thank you all in advance.
If your file format is guaranteed to be like you said:
1. Hello, World
2. Hello earth
You don't even need to use a regex, you could just use split
and join
:
clean_line = ' '.join(line.split(' ')[1:]).lstrip()
>>> ' '.join("1. Hello, world".split(' ')[1:])
'Hello, world'
Or, if you still wanted to do substitution, this replace
-based code may work:
number = 1
for line in file_handle:
clean_line = line.replace("%d. " % number, "").lstrip()
number += 1
As others said, you should simply use a regular expression that matches any number, such as r"\\d"
or r"\\d+"
. However, for learning purposes, here is the answer to what you did ask.
The closest useful equivalent of "variable expansion" is the string formatting operator:
cleanText = re.sub('%d.' % number, line).strip()
You could also use str(number) + '.'
to achieve the same effect. There are several more problems with your code:
your loop is wrong; if you're iterating over range(1, 10)
, then you don't need to increment number
manually.
you probably meant range(1, 11)
.
.
in regular expression syntax matches any characters; you want \\.
.
A cleaned-up version might look like this:
cleanText = line.strip()
for i in xrange(1, 11):
cleanText = re.sub(r'%d\.', '' , cleanText)
import re
fp = open('line','r')
for line in fp:
pattern = re.match(r'[0-9]*\.(.*)',line)
if pattern:
print pattern.group(1)
else:
print line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.