[英]How to loop over a list of strings, operate some string manipulations and return them?
Here is the way I come up with: 这是我想出的方法:
a = 'bats bear'
b = 'cats pear'
def sub_strings(a, b):
for s in [a, b]:
s = re.sub('\\b.ear\\b', '', s)
return a, b
a, b = sub_strings(a, b)
But that doesn't work at all, and the function still outputs the original strings ('bats bear', 'cats pear')
. 但这根本不起作用,该函数仍会输出原始字符串('bats bear', 'cats pear')
。 What's wrong with this approach? 这种方法有什么问题?
s = re.sub('\\b.ear\\b', '', s)
does not do what you think it does. 不按照您的想法去做。 It merely rebinds the variable named s
to the modified string returned by re.sub()
. 它仅将名为s
的变量重新绑定到re.sub()
返回的修改后的字符串。 It does not alter the variables a
nor b
. 它不会更改变量a
或b
。 You can check that by printing out the value of s
in the loop. 您可以通过在循环中打印出s
的值来进行检查。
Instead you can return a generator: 相反,您可以返回一个生成器:
def sub_strings(a, b):
return (re.sub(r'\b.ear\b', '', s) for s in (a, b))
A list comprehension will also work: 列表理解也将起作用:
def sub_strings(a, b):
return [re.sub(r'\b.ear\b', '', s) for s in (a, b)]
Either way, the result will be unpacked into the variables a
and b
as required. 无论哪种方式,结果都将根据需要解压缩到变量a
和b
中。
You might want to consider generalising the function so that it accepts an arbitrary number of parameters: 您可能需要考虑泛化该函数,以便它接受任意数量的参数:
def sub_strings(*args):
return (re.sub(r'\b.ear\b', '', s) for s in args)
Now you can call it with any number of arguments: 现在,您可以使用任意数量的参数来调用它:
>>> print(list(sub_strings('bats bear', 'cats pear', 'rats hear')))
['bats ', 'cats ', 'rats ']
>>> print(list(sub_strings('bats bear', 'cats pear', 'rats hear', 'gnats rear')))
['bats ', 'cats ', 'rats ', 'gnats ']
Try this 尝试这个
a = 'bats bear'
b = 'cats pear'
def sub_strings(a, b):
result = []
for s in [a, b]:
result.append(re.sub('\\b.ear\\b', '', s) )
return result[0], result[1]
a, b = sub_strings(a, b)
The problem you are having is that in Python, strings (ie, str
type objects) are immutable objects. 您遇到的问题是在Python中,字符串(即str
类型的对象)是不可变的对象。 Because a string object cannot be changed, ANY function you perform on a string never changes the original string . 由于无法更改字符串对象,因此您对字符串执行的ANY函数都不会更改原始字符串 。 It ALWAYS remains the same: 它始终保持不变:
>>> s = 'abc'
>>> s.replace('abc', 'def') # perform some method on s
>>> print(s) # has s been changed?
abc # NOPE
If you want to get a manipulated version of your string, you have to save the manipulated version somewhere and return THAT . 如果要获取字符串的受控版本,则必须将受控版本保存在某处并返回THAT 。 The other answers that have been provided show clearly how to do this. 提供的其他答案清楚地显示了如何执行此操作。
As for your actual problem, I would suggest using a generator. 至于您的实际问题,我建议您使用发电机。 A generator is a function that behaves very differently from a normal function. 生成器是一种行为与正常函数完全不同的函数。 One of these differences is the generator function is capable of generating multiple results- one at a time- with only a single function call. 这些差异之一是生成器函数能够仅一次调用一个函数就一次生成多个结果。
To make a generator, instead of using the word return
, you use yield
. 要生成一个生成器,可以使用yield
,而不要使用单词return
。 Here is an example: 这是一个例子:
a = 'bats bear'
b = 'cats pear'
def sub_string_gen(*strings):
for s in strings:
yield re.sub('\\b.ear\\b', '', s)
a, b = sub_strings(a, b) # generator is "unpacked" here
Note that the *strings
syntax allows the function to accept multiple arguments. 请注意, *strings
语法允许函数接受多个参数。 The arguments are available inside your function under a list with the name strings
. 参数可以在函数内部使用名称strings
列表使用。
The reason the above code works is that the last line auto-magically performs an UNPACKING of your executed generator. 上面的代码起作用的原因是,最后一行自动神奇地对您执行的生成器执行UNPACKING。 In other words, each result is yielded one at a time, and unpacked into the corresponding provided names one at a time. 换句话说,每个结果一次产生一个,然后一次解压缩到相应的提供名称中。
Be careful, however, that you don't try to do THIS: 但是请小心,不要尝试执行以下操作:
a = sub_strings(a) # BAD!
This will NOT work the way you expect. 这将无法按您期望的方式工作。 It will not work because a = sub_strings(a)
does not unpack the generator; 因为a = sub_strings(a)
不会解压缩生成器,所以a = sub_strings(a)
不起作用。 it instead creates a generator and assigns it to a
; 它而不是创建一个发生器,并将其分配到a
; the generator has NOT been unpacked. 发电机尚未打开包装。 Clarification on terminology: sub_strings
is a generator function ; 术语澄清: sub_strings
是一个生成器函数 ; sub_strings(a,b,c)
creates a generator using that generator function. sub_strings(a,b,c)
使用该生成器函数创建一个生成器。
To unpack the generator to a single name, do this instead: 要将生成器解压缩为单个名称,请执行以下操作:
a, = sub_strings(a) # Note the comma
The extra comma makes the a
into a tuple of symbols instead of a singleton. 多余的逗号使a
成为符号的元组,而不是单例。 This lets the interpreter know that you mean to "unpack" the generator into the lone symbol, a
. 这使解释器知道您的意思是将生成器“解包”为单独的符号a
。
I like this syntax very much because it keeps you from making errors that are not always easy to see. 我非常喜欢这种语法,因为它使您避免犯下并不总是容易看到的错误。 For example, if you provide too many arguments to sub_strings
but not enough variables, it will give you an error and let you know there is a problem: 例如,如果您为sub_strings
提供了太多参数,但没有提供足够多的变量,则会给您一个错误并通知您存在问题:
>>> a, b = sub_strings(a, b, c) # extra c argument
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
Another way to use your generator is to simply stuff the results into a list, a tuple, or anything else that accepts an iterable object (generators are iterable): 使用生成器的另一种方法是将结果简单地填充到列表,元组或接受可迭代对象的其他任何对象中(生成器是可迭代的):
>>> results = list(sub_strings(a, b, c, d, e, f))
There is also another very nice alternative syntax that does the same thing. 还有另一种非常不错的替代语法可以执行相同的操作。 Here we see that star again (some people call it the "splat"). 在这里,我们再次看到了那颗星(有人称它为“ splat”)。 The splat "unpacks" the generator one value at a time, much the same as it was automatically unpacked before: splat一次“解压缩”生成器一个值,与之前自动解压缩的值几乎相同:
>>> results = [*sub_strings(a, b, c, d, e, f)]
Lastly: you don't even have to define a function to make a generator. 最后:您甚至不必定义一个函数来生成一个生成器。 You can instead just use what is called a generator expression . 您可以改用所谓的生成器表达式 。
>>> a, b = (re.sub('\\b.ear\\b', '', s) for s in (a, b))
You can use such an expression in any of the places we used our generator above: 您可以在我们上面使用生成器的任何地方使用这样的表达式:
>>> results = list((re.sub('\\b.ear\\b', '', s) for s in (a, b)))
>>> results = [*(re.sub('\\b.ear\\b', '', s) for s in (a, b))]
Observe that the part that is called the generator expression replaced the generator function call- which creates a generator- in the previous versions of the code. 请注意,在以前的代码版本中,称为生成器表达式的部分替换了生成器函数调用(该函数创建了生成器)。
However, if your goal is a list
an even shorter syntax is just to use what is called a list comprehension: 但是,如果您的目标是list
,则更短的语法只是使用所谓的列表理解:
>>> results = [re.sub('\\b.ear\\b', '', s) for s in (a, b)]
There is much, MUCH more to learn about Python generators. 关于Python生成器,还有很多更多的知识要学习。 Go here to get started. 转到这里开始。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.