I know how to solve my problem without using lambda/map (and I can't use regex or other python libraries for this exercise), using a for-loop with string.replace()....but I am really trying to see if I can achieve the same result using a combination of map/lambda and string.replace\\
My goal here is to read in a txt file, and then substitute every instance of a non-standard 'e' (like éêèÉÊÈ ) with the letter 'e'
My main issue now is that i get 6 lists, (eg I have 6 strings in newFile / newFileListComprehension and each string has updated the original string, based on the 1 iterable that was evaluated
eg newFile[0] = output of .replace('é') , newFile[1] = output of .replace('ê') etc.
So what I would like, is to return 1 copy of the formatted string, with all of the .replace() iterated over it.
Link to the text file I am referencing below can be accessed https://easyupload.io/s7m0zj
import string
def file2str(filename):
with open(filename, 'r', encoding="utf8") as fid:
return fid.read()
def count_letter_e(filename, ignore_accents, ignore_case):
eSToAvoid = 'éêèÉÊÈ'
textFile = file2str("Sentence One.txt")
newFileListComprehension = [textFile.replace(e,'e') for e in eSToAvoid if ignore_accents == 1]
value = textFile.count('e')
#newFile = list((map(lambda element: (textFile.replace(element, 'e') if ignore_accents == 1 else textFile.count('e')), eSToAvoid)))
return 0
numberOfEs = count_letter_e("Sentence One.txt", 1, 1)```
You can use str.translate
for replacing multiple characters at once. str.maketrans
helps you create the required mapping:
eSToAvoid = 'éêèÉÊÈ'
textFile.translate(str.maketrans(eSToAvoid, 'e' * len(eSToAvoid)))
While the str.replace
can only replace one substring with another, re.sub
can replace a pattern.
In [55]: eSToAvoid = 'éêèÉÊÈ'
In [58]: import re
test cases:
In [61]: re.sub(r'[éêèÉÊÈ]', 'e', 'foobar')
Out[61]: 'foobar'
In [62]: re.sub(r'[éêèÉÊÈ]', 'e', eSToAvoid)
Out[62]: 'eeeeee'
In [63]: re.sub(r'[éêèÉÊÈ]', 'e', 'testingè,É foobar è É')
Out[63]: 'testinge,e foobar e e'
The string replace approach is:
In [70]: astr = 'testingè,É foobar è É'
...: for e in eSToAvoid:
...: astr = astr.replace(e,'e')
...:
In [71]: astr
Out[71]: 'testinge,e foobar e e'
the replace is applied sequentially to astr
. This can't be expressed as a list comprehension (or map
). A list comprehensions most naturally replaces a loop that collects its results in a list (with list.append
).
There's nothing wrong with the for loop. It's actually faster:
In [72]: %%timeit
...: astr = 'testingè,É foobar è É'
...: for e in eSToAvoid:
...: astr = astr.replace(e,'e')
...:
...:
1.37 µs ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [73]: timeit re.sub(r'[éêèÉÊÈ]', 'e', 'testingè,É foobar è É')
2.79 µs ± 15.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [77]: timeit astr.translate(str.maketrans(eSToAvoid, 'e' * len(eSToAvoid)))
2.56 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [93]: from functools import reduce
In [96]: reduce(lambda s,e: s.replace(e,'e'),eSToAvoid, 'testingè,É foobar è É' )
Out[96]: 'testinge,e foobar e e'
In [97]: timeit reduce(lambda s,e: s.replace(e,'e'),eSToAvoid, 'testingè,É foobar è É' )
2.11 µs ± 32.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For fun you could also explore some of the idea presented here:
Cleanest way to combine reduce and map in Python
You want to 'accumulate' changes, and to do that, you need some sort of accumulator, something that hangs on to the last replace. itertools
has an accumulate
function, and Py 3.8 introduced a :=
walrus operator.
In [110]: def foo(astr, es):
...: for e in es:
...: astr = astr.replace(e,'e')
...: yield astr
...:
In [111]: list(foo(astr, eSToAvoid))
Out[111]:
['testingè,É foobar è É',
'testingè,É foobar è É',
'testinge,É foobar e É',
'testinge,e foobar e e',
'testinge,e foobar e e',
'testinge,e foobar e e']
Or [s for s in foo(astr, eSToAvoid)]
in place of the list()
. This highlights that fact that a list comprehension returns a list of strings, even if the strings accumulate the changes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.