简体   繁体   中英

Python Map, Lambda, and string.replace

I know how to solve my problem without using lambda/map (and I can't use regex or other python libraries for this exercise), using a for-loop with string.replace()....but I am really trying to see if I can achieve the same result using a combination of map/lambda and string.replace\\

My goal here is to read in a txt file, and then substitute every instance of a non-standard 'e' (like éêèÉÊÈ ) with the letter 'e'

My main issue now is that i get 6 lists, (eg I have 6 strings in newFile / newFileListComprehension and each string has updated the original string, based on the 1 iterable that was evaluated

eg newFile[0] = output of .replace('é') , newFile[1] = output of .replace('ê') etc.

So what I would like, is to return 1 copy of the formatted string, with all of the .replace() iterated over it.

Link to the text file I am referencing below can be accessed https://easyupload.io/s7m0zj

import string

def file2str(filename):
    with open(filename, 'r', encoding="utf8") as fid:
        return fid.read()

def count_letter_e(filename, ignore_accents, ignore_case):
    eSToAvoid = 'éêèÉÊÈ'
    textFile = file2str("Sentence One.txt")
    newFileListComprehension = [textFile.replace(e,'e') for e in eSToAvoid if ignore_accents == 1]
    value = textFile.count('e')
    #newFile = list((map(lambda element: (textFile.replace(element, 'e') if ignore_accents == 1 else textFile.count('e')), eSToAvoid)))
    return 0

numberOfEs = count_letter_e("Sentence One.txt", 1, 1)```

You can use str.translate for replacing multiple characters at once. str.maketrans helps you create the required mapping:

eSToAvoid = 'éêèÉÊÈ'
textFile.translate(str.maketrans(eSToAvoid, 'e' * len(eSToAvoid)))

While the str.replace can only replace one substring with another, re.sub can replace a pattern.

In [55]: eSToAvoid = 'éêèÉÊÈ' 
In [58]: import re 

test cases:

In [61]: re.sub(r'[éêèÉÊÈ]', 'e', 'foobar')                                                                          
Out[61]: 'foobar'
In [62]: re.sub(r'[éêèÉÊÈ]', 'e', eSToAvoid)                                                                         
Out[62]: 'eeeeee'
In [63]: re.sub(r'[éêèÉÊÈ]', 'e', 'testingè,É  foobar  è É')                                                         
Out[63]: 'testinge,e  foobar  e e'

The string replace approach is:

In [70]: astr = 'testingè,É  foobar  è É' 
    ...: for e in eSToAvoid: 
    ...:     astr = astr.replace(e,'e') 
    ...:                                                                                                             
In [71]: astr                                                                                                        
Out[71]: 'testinge,e  foobar  e e'

the replace is applied sequentially to astr . This can't be expressed as a list comprehension (or map ). A list comprehensions most naturally replaces a loop that collects its results in a list (with list.append ).

There's nothing wrong with the for loop. It's actually faster:

In [72]: %%timeit 
    ...: astr = 'testingè,É  foobar  è É' 
    ...: for e in eSToAvoid: 
    ...:     astr = astr.replace(e,'e') 
    ...:  
    ...:                                                                                                             
1.37 µs ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [73]: timeit re.sub(r'[éêèÉÊÈ]', 'e', 'testingè,É  foobar  è É')                                                  
2.79 µs ± 15.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [77]: timeit astr.translate(str.maketrans(eSToAvoid, 'e' * len(eSToAvoid)))                                       
2.56 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

reduce

In [93]: from functools import reduce  
In [96]: reduce(lambda s,e: s.replace(e,'e'),eSToAvoid, 'testingè,É  foobar  è É' )                                  
Out[96]: 'testinge,e  foobar  e e'
In [97]: timeit reduce(lambda s,e: s.replace(e,'e'),eSToAvoid, 'testingè,É  foobar  è É' )                           
2.11 µs ± 32.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For fun you could also explore some of the idea presented here:

Cleanest way to combine reduce and map in Python

You want to 'accumulate' changes, and to do that, you need some sort of accumulator, something that hangs on to the last replace. itertools has an accumulate function, and Py 3.8 introduced a := walrus operator.

generator

In [110]: def foo(astr, es): 
     ...:     for e in es: 
     ...:         astr = astr.replace(e,'e') 
     ...:         yield astr 
     ...:                                                                                                            
In [111]: list(foo(astr, eSToAvoid))                                                                                 
Out[111]: 
['testingè,É  foobar  è É',
 'testingè,É  foobar  è É',
 'testinge,É  foobar  e É',
 'testinge,e  foobar  e e',
 'testinge,e  foobar  e e',
 'testinge,e  foobar  e e']

Or [s for s in foo(astr, eSToAvoid)] in place of the list() . This highlights that fact that a list comprehension returns a list of strings, even if the strings accumulate the changes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM