简体   繁体   English

从字符串中删除尾随空格的pythonic方法是什么?

[英]What is the pythonic way to remove trailing spaces from a string?

The parameter to the function satisfy these rules: function 的参数满足以下规则:

  1. It does not have any leading whitespace它没有任何前导空格
  2. It might have trailing whitespaces它可能有尾随空格
  3. There might be interleaved whitespaces in the string.字符串中可能有交错的空格。

Goal: remove duplicate whitespaces that are interleaved & strip trailing whitespaces.目标:删除交错的重复空格并去除尾随空格。

This is how I am doing it now:这就是我现在的做法:

# toks - a priori no leading space
def squeeze(toks):
  import re
  p = re.compile(r'\W+')
  a = p.split( toks ) 
  for i in range(0, len(a)):
    if len(a[i]) == 0:
      del a[i]
  return ' '.join(a) 

>>> toks( '  Mary  Decker   is hot   ' )
Mary Decker is hot

Can this be improved?这可以改进吗? Pythonic enough?够Pythonic吗?

This is how I would do it:我会这样做:

" ".join(toks.split())

PS. PS。 Is there a subliminal message in this question?这个问题有潜意识信息吗? ;-) ;-)

Can't you use rstrip()?你不能使用 rstrip() 吗?

some_string.rstrip() 

or strip() for stripping the string from both sides?或 strip() 用于从两侧剥离字符串?

In addition: the strip() methods also support to pass in arbitrary strip characters:另外:strip() 方法还支持传入任意的带字符:

string.strip = strip(s, chars=None)
    strip(s [,chars]) -> string

Related: if you need to strip whitespaces in-between: split the string, strip the terms and re-join it.相关:如果您需要去除中间的空格:拆分字符串,去除术语并重新加入它。

Reading the API helps!阅读 API 会有所帮助!

To answer your questions literally:从字面上回答你的问题:

Yes, it could be improved.是的,它可以改进。 The first improvement would be to make it work.第一个改进是让它工作。

>>> squeeze('x    !    y')
'x y' # oops

Problem 1: You are using \W+ (non-word characters) when you should be using \s+ (whitespace characters)问题 1:当您应该使用 \s+(空白字符)时,您使用的是 \W+(非单词字符)

>>> toks = 'x  !  y  z  '
>>> re.split('\W+', toks)
['x', 'y', 'z', '']
>>> re.split('\s+', toks)
['x', '!', 'y', 'z', '']

Problem 2: The loop to delete empty strings works, but only by accident.问题 2:删除空字符串的循环有效,但只是偶然。 If you wanted a general-purpose loop to delete empty strings in situ, you would need to work backwards, otherwise your subscript i would get out of whack with the number of elements remaining.如果您想要一个通用循环来原位删除空字符串,则需要向后工作,否则您的下标 i 会因剩余元素的数量而失控。 It works here because re.split() without a capturing group can produce empty elements only at the start and end.它在这里起作用,因为没有捕获组的 re.split() 只能在开始和结束时产生空元素。 You have defined away the start problem, and the end case doesn't cause a problem because there have been no prior deletions.您已经定义了开始问题,并且结束情况不会导致问题,因为之前没有删除。 So you are left with a very ugly loop which could be replaced by two lines:所以你留下了一个非常丑陋的循环,可以用两行代替:

if a and not a[-1]: # guard against empty list
    del a[-1]

However unless your string is very long and you are worried about speed (in which case you probably shouldn't be using re), you'd probably want to allow for leading whitespace (assertions like "my data doesn't have leading whitespace" are ignored by convention) and just do it in a loop on the fly:但是,除非您的字符串很长并且您担心速度(在这种情况下您可能不应该使用 re),否则您可能希望允许前导空格(诸如“我的数据没有前导空格”之类的断言按照惯例被忽略)并且只是在一个循环中执行它:

a = [x for x in p.split(toks) if x]

Next step is to avoid building the list a :下一步是避免构建列表a

return ' '.join(x for x in p.split(toks) if x)

Now you did mention "Pythonic"... so let's throw out all that re import and compile overhead stuff, and the genxp and just do this:现在你确实提到了“Pythonic”......所以让我们扔掉所有重新导入和编译开销的东西,以及 genxp,然后这样做:

return ' '.join(toks.split())

Well, I tend not to use the re module if I can do the job reasonably with the built-in functions and features.好吧,如果我可以通过内置的功能和特性合理地完成工作,我倾向于不使用re模块。 For example:例如:

def toks(s):
    return ' '.join([x for x in s.split(' ') if x])

... seems to accomplish the same goal with only built in split , join , and the list comprehension to filter our empty elements of the split string. ... 似乎实现了相同的目标,仅内置splitjoin和列表推导来过滤拆分字符串的空元素。

Is that more "Pythonic?"那更“Pythonic”吗? I think so.我认同。 However my opinion is hardly authoritative.但是,我的意见几乎没有权威性。

This could be done as a lambda expression as well;这也可以作为 lambda 表达式来完成; and I think that would not be Pythonic.我认为那不会是 Pythonic。

Incidentally this assumes that you want to ONLY squeeze out duplicate spaces and trim leading and trailing spaces.顺便说一句,这假设您只想挤出重复的空格并修剪前导和尾随空格。 If your intent is to munge all whitespace sequences into single spaces (and trim leading and trailing) then change s.split(' ') to s.split() -- passing no argument, or None , to the split() method is different than passing it a space.如果您的意图是将所有空白序列转换为单个空格(并修剪前导和尾随),则将s.split(' ')更改为s.split() - 不传递参数或Nonesplit()方法是与传递一个空间不同。

I know this question is old.我知道这个问题很老了。 But why not use regex?但是为什么不使用正则表达式呢?

import re

result = '  Mary  Decker   is hot   '
print(f"=={result}==")

result = re.sub('\s+$', '', result)
print(f"=={result}==")

result = re.sub('^\s+', '', result)
print(f"=={result}==")

result = re.sub('\s+', ' ', result)
print(f"=={result}==")

The output is output 是

==  Mary  Decker   is hot   ==
==  Mary  Decker   is hot==
==Mary  Decker   is hot==
==Mary Decker is hot==

To make your code more Pythonic, you must realize that in Python, a[i] being a string, instead of deleting a[i] if a[i]=='' , it is better keeping a[i] if a[i]!='' .为了使您的代码更加 Pythonic,您必须意识到在 Python 中, a[i]是一个字符串,而不是deleting a[i] if a[i]=='' ,最好keeping a[i] if a[i]!=''

So, instead of所以,而不是

def squeeze(toks):
    import re
    p = re.compile(r'\W+')
    a = p.split( toks )
    for i in range(0, len(a)):
        if len(a[i]) == 0:
            del a[i]
    return ' '.join(a)

write

def squeeze(toks):
    import re
    p = re.compile(r'\W+')
    a = p.split( toks )
    a = [x for x in a if x]
    return ' '.join(a)

and then接着

def squeeze(toks):
    import re
    p = re.compile(r'\W+')
    return ' '.join([x for x in p.split( toks ) if x])

Then, taking account that a function can receive a generator as well as a list:然后,考虑到 function 可以接收生成器以及列表:

def squeeze(toks):
    import re
    p = re.compile(r'\W+')
    return ' '.join((x for x in p.split( toks ) if x))

and that doubling parentheses isn't obligatory:并且双括号不是强制性的:

def squeeze(toks):
    import re
    p = re.compile(r'\W+')
    return ' '.join(x for x in p.split( toks ) if x)

. .

. .

Additionally, instead of obliging Python to verify if re is or isn't present in the namespace of the function squeeze() each time it is called (it is what it does), it would be better to pass re as an argument by defautlt:此外,与其强制re验证re是否存在于function的命名空间中:

import re
def squeeze(toks,re = re):
    p = re.compile(r'\W+')
    return ' '.join(x for x in p.split( toks ) if x)

and, even better:而且,甚至更好:

import re
def squeeze(toks,p = re.compile(r'\W+')):
    return ' '.join(x for x in p.split( toks ) if x)

. .

. .

Remark: the if x part in the expression is useful only to leave apart the heading '' and the ending '' occuring in the list p.split( toks ) when toks begins and ends with whitespaces.备注:表达式中的if x部分仅用于在 toks 以空格开头和结尾时将列表p.split( toks )中出现的标题''和结尾''分开。

But, instead of splitting, it is as much good to keep what is desired:但是,与其分开,不如保留想要的东西:

import re
def squeeze(toks,p = re.compile(r'\w+')):
    return ' '.join(p.findall(toks))

. .

. .

All that said, the pattern r'\W+' in your question is wrong for your purpose, as John Machin pointed it out.综上所述,正如 John Machin 指出的那样,您问题中的模式r'\W+'不符合您的目的。

If you want to compress internal whitespaces and to remove trailing whitespaces, whitespace being taken in its pure sense designating the set of characters ' ', '\f', '\n', '\r', '\t', '\v' ( see \s in re ), you must replace your spliting with this one:如果要压缩内部空格并删除尾随空格,则在其纯粹意义上采用空格来指定字符集' ', '\f', '\n', '\r', '\t', '\v' (见re中的 \s ),你必须用这个替换你的分割:

import re
def squeeze(toks,p = re.compile(r'\s+')):
    return ' '.join(x for x in  p.split( toks ) if x)

or, keeping the right substrings:或者,保留正确的子字符串:

import re
def squeeze(toks,p = re.compile(r'\S+')):
    return ' '.join(p.findall(toks))

which is nothing else than the simpler and faster expression ' '.join(toks.split())这只不过是更简单、更快的表达式' '.join(toks.split())

But if you want in fact just to compress internal and remove trailing characters ' ' and '\t' , keeping the newlines untouched, you will use但是,如果您实际上只是想压缩内部并删除尾随字符' ''\t' ,保持换行符不变,您将使用

import re
def squeeze(toks,p = re.compile(r'[^ \t]+')):
    return ' '.join(p.findall(toks))

and that can't be replaced by anything else.并且不能被其他任何东西取代。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算字符串中前导空格的 pythonic 方法是什么? - What is the pythonic way to count the leading spaces in a string? 从字符串的开头删除数字的最Pythonic方法是什么? - What's the most Pythonic way to remove a number from start of a string? 从字符串中删除字符并创建子字符串的最pythonic方法是什么? - What is the most pythonic way to remove characters from a string and create substrings? "如何从字符串中删除前导和尾随空格?" - How to remove leading and trailing spaces from a string? 有没有办法使用 Python 或一些 NLP 技术从字符串中删除不需要的空格? (不是尾随或额外的空格) - Is there a way to remove unwanted spaces from a string using Python or some NLP technique?? (NOT trailing or extra spaces) 从 HTTP 授权标头中删除“Bearer”的 Pythonic 方法是什么 - What is the Pythonic way to remove "Bearer " from the HTTP authorization header 在一行中用空格分隔多次打印字符串的pythonic方法是什么? - What is a pythonic way to print a string multiple times on one line, separated by spaces? 从Python列表中删除前导和尾随空格 - Remove leading and trailing spaces from a Python list 如何使用 rstrip 从由循环生成的字符串中删除尾随空格 - How to remove trailing spaces using rstrip from a string made from a loop 从列表中的字符串中删除尾随空格,该列表属于dict(dict的值) - Remove trailing spaces from string in list, where the list belongs to a dict (value of a dict)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM