简体   繁体   English

Python从字符串中删除定界符

[英]Python removing delimiters from strings

I have 2 related questions/ issues. 我有2个相关问题。

def remove_delimiters (delimiters, s):
    for d in delimiters:
        ind = s.find(d)
        while ind != -1:
            s = s[:ind] + s[ind+1:]
            ind = s.find(d)

    return ' '.join(s.split())


delimiters = [",", ".", "!", "?", "/", "&", "-", ":", ";", "@", "'", "..."]
d_dataset_list = ['hey-you...are you ok?']
d_list = []

for d in d_dataset_list:
    d_list.append(remove_delimiters(delimiters, d[1]))

print d_list

Output = 'heyyouare you ok' 输出= 'heyyouare you ok'

  1. What is the best way of avoiding strings being combined together when a delimiter is removed? 删除定界符时避免字符串合并在一起的最佳方法是什么? For example, so that the output is hey you are you ok ? 例如,这样输出就hey you are you ok

  2. There may be a number of different sequences of ... , for example .. or .......... etc. How does one go around implementing some form of rule, where if more than one . 可能有许多不同的序列... ,例如............等等如何去实现左右某种形式的规则,其中如果超过一个. appear after each other, to remove it? 出现之后,要删除吗? I want to try and avoid hard-coding all sequences in my delimiters list. 我想尝试避免对分隔符列表中的所有序列进行硬编码。 Thankyou 谢谢

You could try something like this: 您可以尝试这样的事情:

  1. Given delimiters d , join them to a regular expression 给定定界符d ,将它们加入正则表达式

     >>> d = ",.!?/&-:;@'..." >>> "["+"\\\\".join(d)+"]" "[,\\\\.\\\\!\\\\?\\\\/\\\\&\\\\-\\\\:\\\\;\\\\@\\\\'\\\\.\\\\.\\\\.]" 
  2. Split the string using this regex with re.split 使用此正则表达式和re.split分割字符串

     >>> s = 'hey-you...are you ok?' >>> re.split("["+"\\\\".join(d)+"]", s) ['hey', 'you', '', '', 'are you ok', ''] 
  3. Join all the non-empty fragments back together 将所有非空片段重新连接在一起

     >>> ' '.join(w for w in re.split("["+"\\\\".join(d)+"]", s) if w) 'hey you are you ok' 

Also, if you just want to remove all non-word characters, you can just use the character group \\W instead of manually enumerating all the delimiters: 另外,如果只想删除所有非单词字符,则可以使用字符组\\W而不是手动枚举所有定界符:

>>> ' '.join(w for w in re.split(r"\W", s) if w)
'hey you are you ok'

So first of all, your function for removing delimiters could be simplified greatly by using the replace function ( http://www.tutorialspoint.com/python/string_replace.htm ) 因此,首先,可以通过使用replace函数( http://www.tutorialspoint.com/python/string_replace.htm )大大简化删除定界符的功能

This would help solve your first question. 这将有助于解决您的第一个问题。 Instead of just removing them, replace with a space, then get rid of the spaces using the pattern you already used (split() treats consecutive delimiters as one) 而不是仅仅删除它们,而是替换为一个空格,然后使用您已经使用的模式消除这些空格(split()将连续的定界符视为一个)

A better function, which does this, would be: 一个更好的功能是:

def remove_delimiters (delimiters, s):
    new_s = s
    for i in delimiters: #replace each delimiter in turn with a space
        new_s = new_s.replace(i, ' ')
    return ' '.join(new_s.split())

to answer your second question, I'd say it's time for regular expressions 回答您的第二个问题,我想是时候使用正则表达式了

>>> import re
... ss = 'hey ... you are ....... what?'
... print re.sub('[.+]',' ',ss)
hey     you are         what?
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM