簡體   English   中英

第 7 章,使用 Python 將無聊的東西自動化,練習項目:strip() 的正則表達式版本

[英]Chapter 7, Automate the boring stuff with Python, practice project: regex version of strip()

我正在看《用Python自動化無聊的東西》這本書。在第7章,在項目實踐中:strip()的正則表達式版本,這是我的代碼(我使用Python 3.x):

def stripRegex(x,string):
import re
if x == '':
    spaceLeft = re.compile(r'^\s+')
    stringLeft = spaceLeft.sub('',string)
    spaceRight = re.compile(r'\s+$')
    stringRight = spaceRight.sub('',string)
    stringBoth = spaceRight.sub('',stringLeft)
    print(stringLeft)
    print(stringRight)

else:
    charLeft = re.compile(r'^(%s)+'%x)
    stringLeft = charLeft.sub('',string)
    charRight = re.compile(r'(%s)+$'%x)
    stringBoth = charRight.sub('',stringLeft)
print(stringBoth)

x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = '      Hello world!!!   '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'
stripRegex(x1,string1)
stripRegex(x2,string2)
stripRegex(x3,string2)

這是 output:

Hello world!!!   
      Hello world!!!
Hello world!!!
BaconSpamEggs
SpamSpamBaconSpamEggsSpamSpam

所以,我的 strip() 正則表達式版本幾乎可以作為原始版本使用。 在原始版本中,無論您傳入'Spam'、'pSam'、'mapS'、'Smpa',output 始終是“BaconSpamEggs”... 那么如何在Regex 版本中解決這個問題???

import re

def regexStrip(x,y=''):


if y!='':
    yJoin=r'['+y+']*([^'+y+'].*[^'+y+'])['+y+']*'
    cRegex=re.compile(yJoin,re.DOTALL)
    return cRegex.sub(r'\1',x)
else:
    sRegex=re.compile(r'\s*([^\s].*[^\s])\s*',re.DOTALL)
    return sRegex.sub(r'\1',x)

text='  spmaHellow worldspam'
print(regexStrip(text,'spma'))

您可以像這樣檢查正則表達式中的多個字符:

charLeft = re.compile(r'^([%s]+)' % 'abc') 
print charLeft.sub('',"aaabcfdsfsabca")
>>> fdsfsabca

或者甚至更好,用一個正則表達式來做:

def strip_custom(x=" ", text):
    return re.search(' *[{s}]*(.*?)[{s}]* *$'.format(s=x), text).group(1)

split_custom('abc', ' aaabtestbcaa ')
>>> test

我改變了論點,但從我的快速測試來看,這似乎有效。 我給了它一個可選參數,默認為None

def stripRegex(s,toStrip=None):
    import re
    if toStrip is None:
        toStrip = '\s'
    return re.sub(r'^[{0}]+|[{0}]+$'.format(toStrip), '', s)

x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = '      Hello world!!!   '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'

print(stripRegex(string1)) # 'Hello world!!!'
print(stripRegex(string1, x1)) # '      Hello world!!!   '
print(stripRegex(string2, x2)) # 'BaconSpamEggs'
print(stripRegex(string2, x3)) # 'BaconSpamEggs'

我為相同的代碼編寫了兩個不同的代碼:第一種方式:

import re    
def stripfn(string, c):
        if c != '':
            Regex = re.compile(r'^['+ c +']*|['+ c +']*$')
            strippedString = Regex.sub('', string)
            print(strippedString)
        else:
            blankRegex = re.compile(r'^(\s)*|(\s)*$')
            strippedString = blankRegex.sub('', string)
            print(strippedString)

方式二:

import re
def stripfn(string, c):
    if c != '':
        startRegex = re.compile(r'^['+c+']*')
        endRegex = re.compile(r'['+c+']*$')
        startstrippedString = startRegex.sub('', string)
        endstrippedString = endRegex.sub('', startstrippedString)
        print(endstrippedString)
    else:
        blankRegex = re.compile(r'^(\s)*|(\s)*$')
        strippedString = blankRegex.sub('', string)
        print(strippedString)

這似乎有效:

def stripp(text, leftright = None):
    import re
    if leftright == None:
        stripRegex = re.compile(r'^\s*|\s*$')
        text = stripRegex.sub('', text)
        print(text)
    else:
        stripRegex = re.compile(r'^.|.$')
        margins = stripRegex.findall(text)
        while margins[0] in leftright:
            text = text[1:]
            margins = stripRegex.findall(text)
        while margins[-1] in leftright:
            text = text[:-2]
            margins = stripRegex.findall(text)
        print(text) 

mo = '    @@@@@@     '
mow = '@&&@#$texttexttext&&^&&&&%%'
bla = '@&#$^%+'

stripp(mo)
stripp(mow, bla)

這是我的版本:

    #!/usr/bin/env python3

import re

def strippp(txt,arg=''): # assigning a default value to arg prevents the error if no argument is passed when calling strippp()
    if arg =='':
        regex1 = re.compile(r'^(\s+)')
        mo = regex1.sub('', txt)
        regex2 = re.compile(r'(\s+)$')
        mo = regex2.sub('', mo)
        print(mo)
    else:
        regex1 = re.compile(arg)
        mo = regex1.sub('', txt)
        print(mo)

text = '        So, you can create the illusion of smooth motion        '
strippp(text, 'e')
strippp(text)

@rtemperv 的解決方案缺少字符串以空白字符開頭/結尾的情況,但未提供此類字符以供刪除。

IE

>>> var="     foobar"
>>> var.strip('raf')
'     foob'

因此正則表達式應該有點不同:

def strip_custom(x=" ", text):
    return re.search('^[{s}]*(.*?)[{s}]*$'.format(s=x), text).group(1)

看下面的代碼

from re import *
check = '1'
while(check == '1'):
    string = input('Enter the string: ')
    strToStrip = input('Enter the string to strip: ')
    if strToStrip == '':                              #If the string to strip is empty
        exp = compile(r'^[\s]*')                      #Looks for all kinds of spaces in beginning until anything other than that is found
        string = exp.sub('',string)                   #Replaces that with empty string
        exp = compile(r'[\s]*$')                      #Looks for all kinds of spaces in the end until anything other than that is found
        string = exp.sub('',string)                   #Replaces that with empty string
        print('Your Stripped string is \'', end = '')
        print(string, end = '')
        print('\'')
    else:
        exp = compile(r'^[%s]*'%strToStrip)           #Finds all instances of the characters in strToStrip in the beginning until anything other than that is found
        string = exp.sub('',string)                   #Replaces it with empty string
        exp = compile(r'[%s]*$'%strToStrip)           #Finds all instances of the characters in strToStrip in the end until anything other than that is found
        string = exp.sub('',string)                   #Replaces it with empty string
        print('Your Stripped string is \'', end = '')
        print(string, end = '')
        print('\'')
    print('Do you want to continue (1\\0): ', end = '')
    check = input()

解釋:

  • 字符類[]用於檢查字符串中字符的各個實例。

  • ^用於檢查要去除的字符串中的字符是否在開頭

  • $用於檢查要剝離的字符串中的字符是否在末尾
  • 如果找到,它們將被替換為帶有sub() empty string

  • *用於匹配要刪除的字符串中的最大字符數,直到找到除此之外的任何字符。

  • *匹配 0 如果找到則沒有實例,或者如果找到則匹配盡可能多的實例。

#! python
# Regex Version of Strip()
import re
def RegexStrip(mainString,charsToBeRemoved=None):
    if(charsToBeRemoved!=None):
        regex=re.compile(r'[%s]'%charsToBeRemoved)#Interesting TO NOTE
        return regex.sub('',mainString)
    else:
        regex=re.compile(r'^\s+')
        regex1=re.compile(r'$\s+')
        newString=regex1.sub('',mainString)
        newString=regex.sub('',newString)
        return newString

Str='   hello3123my43name is antony    '
print(RegexStrip(Str))

我認為這是一個相當舒適的代碼,我發現插入符號 (^) 和美元 ($) 非常有效。

import re
def strips(arg, string):
    beginning = re.compile(r"^[{}]+".format(arg))        
    strip_beginning = beginning.sub("", string)
    ending = re.compile(r"[{}]+$".format(arg))
    strip_ending = ending.sub("", strip_beginning)
    return strip_ending

函數 strips 將刪除“arg”所指的任何內容,而不管出現的情況

我相信這個正則表達式可能更容易理解:

import re

strip_reg =  re.compile("\s*(.*?)\s*$")
strip_rep.search(<mystring>).group(1)

這個怎么運作? 讓我們倒退吧。 我們在字符串 "\\s*$" 的末尾再找一個空格

這 ”。*?” 是一種特殊情況,您要求正則表達式查找要匹配的最少字符數。 (大多數情況下,正則表達式將嘗試獲取最多)我們捕獲了這一點。

我們嘗試在我們捕獲的組之前捕獲零或更多字符。

我的解決方案:

import re

text = """
 Write a function that takes a string and does the same thing as the strip() 
string method. If no other arguments are passed other than the string to 
strip, then whitespace characters will be removed from the beginning and 
end of the string. Otherwise, the characters specified in the second argu -
ment to the function will be removed from the string. 
"""

def regexStrip(text, charsToStrip=''):
    if not charsToStrip:
        strip = re.sub(r'^\s+|\s+$', '', text)
    else:
        strip = re.sub(charsToStrip, '', text)
    return strip

while True:
    arg2 = input('Characters to strip: ')
    print(regexStrip(text, arg2))
#!usr/bin/python3
# my_strip.py - Perform strip function capability with regex
import re

def myStrip(text, character=' '):
    # Strip whitespace by default or user's argument 
    stripCharRegex = re.compile(r'^[%s]*(.*?)[%s]*$'%(character,character)) # (.*?) Will match the least possible of any character (non-greedy)
    return stripCharRegex.search(text).group(1)

我正在使用單個正則表達式來匹配帶狀空格或可選字符。 如果您不理解 %s,請查看String Interpolation 我們希望 (.*?) 盡可能匹配(非貪婪)。 去除那個? 並檢查一下。

以下是我嘗試應用從 RC Martin 的“Clean Code”和 Al Sweigart 的“Automate the無聊的東西”中學到的經驗教訓。 干凈代碼的規則之一是編寫小函數並做一件事。

def removeSpacesAndSecondString(text):
    print(text)
    stripSecondStringRegex = re.compile(r'((\w+)\s(\w+)?)')
    for groups in stripSecondStringRegex.findall(text):
        newText = groups[1]
    print(newText)

def removeSpaces(text):
    print(text)
    stripSpaceRegex = re.compile(r'\s')
    mo = stripSpaceRegex.sub('', text)
    print(mo)

text = '"  hjjkhk  "'

if len(text.split()) > 1:
    removeSpacesAndSecondString(text)
else:
    removeSpaces(text)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM