简体   繁体   English

如何从右侧拆分单词/数字和符号模式处的字符串

[英]How to split a string at word/number and symbol patter from the right

I am trying to split a string that may look like this :我正在尝试拆分可能如下所示的字符串:

A Fool (SEVEN000) (and His Money are S00n) Parted 

Into : A Fool (7000) (and His Money are and S00n) Parted using Python Into : A Fool (7000) (and His Money are and S00n) Parted using Python

The ) will always be present at the end of the string and will always be preceded by a word/number. )将始终出现在字符串的末尾,并且始终以单词/数字开头。 I was thinking splitting it from the right using a [word/number]) pattern would work.我在想使用[word/number])模式从右侧拆分它会起作用。

Edit :编辑 :

As requested here are a few more examples根据要求,这里还有一些例子

Right (Out of the) Gate 

Expected Output : Right (Out of the) Gate预期输出: Right (Out of the) Gate

Right (Out) (of the Gate at 12PM)

Expected Output : Right (Out of the Gate at 12PM)预期输出: Right (Out of the Gate at 12PM) Right (Out of the Gate at 12PM)

You seem to be splitting your string from the last space present in your parenthesis.您似乎正在将字符串与括号中的最后一个空格分开。 You can use this regex,你可以使用这个正则表达式,

 (?=[^()]*\))(?=\S*\))

Demo演示

Check this Python code,检查此 Python 代码,

import re

s = 'A Fool (SEVEN000) (and His Money are S00n) Parted'
arr = re.split(r' (?=[^()]*\))(?=\S*\))', s)
print(arr)

Prints like you wanted,打印出你想要的样子,

['A Fool (SEVEN000) (and His Money are', 'S00n) Parted']

Here is one option using re.split with a positive lookahead.这是使用re.split和积极的前瞻的一种选择。 The pattern I use is:我使用的模式是:

\s+(?=\w+\)(?:\s|$))

This pattern says to split and consume any amount of whitespace, when what follows is one or more word characters which itself is followed by a closing parenthesis and whitespace or the end of the input.这种模式表示拆分和消耗任意数量的空格,当后面是一个或多个单词字符时,它本身后跟一个右括号和空格或输入的结尾。

input = "A Fool (SEVEN000) (and His Money are S00n) Parted"
parts = re.split(r'\s+(?=\w+\)(?:\s|$))', input)
print(parts)

['A Fool (SEVEN000) (and His Money are', 'S00n) Parted']

Use the following regular expression, and substring at the index:使用以下正则表达式,并在索引处使用子字符串:

\b[A-Za-z0-9]+\) [A-Za-z0-9]+$

(this assumes that after the closing bracket there is only a single word, you would need to give more information so I can update the answer if that's not true) (这假设在右括号之后只有一个词,您需要提供更多信息,以便我可以在不正确的情况下更新答案)

I would do it following way:我会这样做:

import re
text = 'A Fool (SEVEN000) (and His Money are S00n) Parted'
parted = re.findall(r'(.+)\s+(\S+\)[^\)]*$)',text)[0]
print(parted)

Output is following 2-tuple:输出遵循二元组:

('A Fool (SEVEN000) (and His Money are', 'S00n) Parted')

To understand my regular expression it might be breaked into:为了理解我的正则表达式,它可能被分解为:

1st group: .+第一组: .+

seperator: \\s+分隔符: \\s+

2nd group: \\S+\\)[^\\)]*$第二组: \\S+\\)[^\\)]*$

First group match at least 1 characters not being newline \\n , seperator match at least 1 whitespace character (this mean not only space, but also carriage return \\r , tab \\t and so on), lastly but most importantly second group consist of at least one non-whitespace character followed by ) followed by 0 or more not- ) (ie any character which is not ) ) which spans to end of string as denoted by $ .第一组匹配至少1不是换行符\\n字符,分隔符匹配至少1空白字符(这不仅意味着空格,还意味着回车\\r ,制表符\\t等),最后但最重要的第二组包括至少一个非空白字符后跟)后跟0或多个 not- ) (即任何不是)字符),它跨越到由$表示的字符串末尾。 If you want solely spaces instead whitespace characters then replace \\s with如果您只想要空格而不是空格字符,则将\\s替换为(space) and \\S with [^ ] (空格)和\\S[^ ]

If the separator is only a whitespace, we can do it without regex.如果分隔符只是一个空格,我们可以不用正则表达式。 May be like this, using rfind() :可能是这样的,使用rfind()

def splitter(a_string):
    idx1 = a_string.rfind(')')
    idx2 = a_string.rfind(' ', 0, idx1)
    idx3 = a_string.rfind('(', 0, idx1)
    if (idx2 > -1) and (idx3 < idx2):
         return (a_string[:idx2], a_string[idx2:])
    else:
         return None

input: splitter('Right (Out) (of the Gate at 12PM)')输入: splitter('Right (Out) (of the Gate at 12PM)')

output: ('Right (Out) (of the Gate at', ' 12PM)')输出:( ('Right (Out) (of the Gate at', ' 12PM)')

input: splitter('Right (Out)')输入: splitter('Right (Out)')

output: None输出: None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM