Text file ( file.txt
) looks like this:
First line.
2. Second line
03 Third line
04. Fourth line
5. Line.
6 Line
Desired output is 1) eliminating numbers at the beginning of line and 2) remove punctuation:
First line.
Second line
Third line
Fourth line
Line.
Line
I tried:
import re
file=open("file.txt").read().split()
print([i for i in file if re.sub("[0-9]\.*", "", i)])
But I get results only on word level instead of line level:
['First', 'line.', 'Second', 'line', 'Third', 'line', 'Fourth', 'line', 'Line.', 'Line']
Do not use the re
module in the loop for
. The possibilities of using regex are many and the re
module can also be used as a multiline. For example, use the following:
>>> with open('/tmp/file.txt', 'r') as f:
s = f.read()
>>> # or use direct value to test in the Python console:
>>> s = """First line.
... 2. Second line
... 03 Third line
... 04. Fourth line
... 5. Line.
... 6 Line"""
>>> s
'First line.\n2. Second line \n03 Third line\n04. Fourth line\n5. Line. \n6 Line'
>>> import re
>>> re.sub(r'[0-9\.\s]*(.*)', r'\1\n', s, flags=re.M)
'First line.\nSecond line \nThird line\nFourth line\nLine. \nLine\n'
>>> re.sub(r'^[0-9\.\s]*(.*)', r'\1', s, flags=re.M)
'First line.\nSecond line \nThird line\nFourth line\nLine. \nLine'
You may fix your current code using
with open("file.txt") as f:
for line in f:
print(re.sub("^[0-9]+\.?\s*", "", line.rstrip("\n")))
See a Python demo .
You need to open a file and read it line by line. Then, ^[0-9]+\\.?\\s*
pattern searches for 1 or more digits ( [0-9]+
) followed with an optional .
( \\.?
) and then 0+ whitespaces ( \\s*
) on each line and removes the match if found.
The split in this line
file=open("file.txt").read().split()
splits the file by spaces. Use
file=open("file.txt").read().split("\n")
instead to split the file by lines.
Another option is:
import re
f = """First line.
2. Second line
03 Third line
04. Fourth line
5. Line.
6 Line"""
print(re.sub(r"(\d{1,2}\.{,1}\s)", "", f));
it returns:
First line.
Second line
Third line
Fourth line
Line.
Line
It don't have to loop through each line.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.