简体   繁体   English

使用正则表达式对字符串进行分区

[英]Partition given string using regex

Trying to break the string into 2 parts. 尝试将字符串分成两部分。

#Need to get 'I1234' and 'I56/I78'
name1 = 'I1234/I56/I78'

#Need to get '\I1234 ' and 'I56/I78'
name2 = '\I1234 /I56/I78'      

#Need to get '\I1234 ' and '\I56 /I78'
name3 = '\I1234 /\I56 /I78'

#Need to get '\1234 ' and '\I56 /\I78 '
name4 = '\I1234 /\I56 /\I78 '

I tried this, and it worked: 我尝试了一下,它起作用了:

pat_a = re.compile(r'(.+)(/)(.+)')

Is there a better way ?

result = re.findall(pat_a, name2[::-1])

EDIT 编辑

There are more complicated strings possible, for example: 可能有更复杂的字符串,例如:

\I78_[0]/abcd_/efg_ /I1234/I56

Not sure if its better, but you can use partition or split with maxsplit=1 given to avoid the re module import: 不知道它是否更好,但是您可以在给定maxsplit = 1的情况下使用partitionsplit ,以避免re模块导入:

print('I1234/I56/I78'.partition("/"))   # ('I1234', '/', 'I56/I78')

print('I1234/I56/I78'.split("/",1))     # ['I1234', 'I56/I78']

For partition you would need to look at the 0th and 2nd index of the tuple: 对于partition您需要查看元组的第0和第2个索引:

first, _ , last = 'I1234/I56/I78'.partition("/")

Doku: 数独:


Full example: 完整示例:

name1 = 'I1234/I56/I78' 
name2 = '\I1234 /I56/I78'       
name3 = '\I1234 /\I56 /I78' 
name4 = '\I1234 /\I56 /\I78 '

for n in [name1,name2,name3,name4]:
    print(n.partition("/"))   # ('I1234', '/', 'I56/I78')
    print(n.split("/",1))     # ['I1234', 'I56/I78']

Output (backslashes are escaped - thats why they are doubled up): 输出(反斜杠被转义-这就是为什么将它们加倍):

('I1234', '/', 'I56/I78')           # using partition
['I1234', 'I56/I78']                # using split

('\\I1234 ', '/', 'I56/I78')        # partition
['\\I1234 ', 'I56/I78']             # split .. etc.

('\\I1234 ', '/', '\\I56 /I78')
['\\I1234 ', '\\I56 /I78']

('\\I1234 ', '/', '\\I56 /\\I78 ')
['\\I1234 ', '\\I56 /\\I78 ']

This answer uses string.split , which seems to be the cleanest method over regex. 这个答案使用string.split ,这似乎是比regex最干净的方法。 I looked at using string.partition , but it produces a tuple , which requires index splitting. 我看着使用string.partition ,但它产生一个tuple ,这需要索引拆分。 Plus the output related to string.partition doesn't give the output that you requested. 再加上与string.partition相关的输出不会提供您要求的输出。

This first example takes a single string and outputs a pair of strings based on your split request. 第一个示例采用单个字符串,并根据您的拆分请求输出一对字符串。

# Need to get '\I1234 ' and '\I56 /I78'
name3 = '\I1234 /\I56 /I78'

# The input name (name3) can be change in a for loop linked to your input. 
split_input = name3.split('/', 1) # maxsplit=1
print (split_input)
# outputs 
#####################################################################
# NOTE: the escaped backslashes, which doesn't match your requirement. 
#####################################################################
['\\I1234 ', '\\I56 /I78'] 

The original output above created escaped backslashes, so this code removes them. 上面创建的原始输出转义了反斜杠,因此此代码将其删除。

# Need to get '\I1234 ' and '\I56 /I78'
name3 = '\I1234 /\I56 /I78'

# The input name (name3) can be change in a for loop linked to your input. 
split_input = str(name3.split('/', 1)).encode('utf-8').decode('unicode_escape')
print (split_input)
# outputs 
['\I1234 ', '\I56 /I78'] # Do you need that trailing space?

I'm not sure where your input values are originally coming from (eg, file, website, etc.), so I added the ones from your question to a list for faster testing. 不确定您的输入值最初来自何处(例如文件,网站等),因此我将问题中的输入值添加到列表中以进行更快的测试。 The next example use list comprehension and string.split. 下一个示例使用列表理解和string.split。

my_strings = ['I1234/I56/I78', '\I1234 /I56/I78', '\I1234 /\I56 /I78', '\I1234 /\I56 /\I78', '\I78_[0]/abcd_/efg_ /I1234/I56']

# Uses list comprehension and string.split to split the elements in your strings
split_input = [x.split('/', 1) for x in my_strings]

# The original output created escaped backslashes, so this code removes them.
decode_output = (str(split_input).encode('utf-8').decode('unicode_escape'))

print (decode_output)
# outputs 
[['I1234', 'I56/I78'], ['\I1234 ', 'I56/I78'], ['\I1234 ', '\I56 /I78'], ['\I1234 ', '\I56 /\I78'], ['\I78_[0]', 'abcd_/efg_ /I1234/I56']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 在字符串分区中使用正则表达式? - How to use regex in string partition using python? 在python中使用Regex排除给定的字符串 - Exclude a given string using Regex in python 使用正则表达式从给定的字符串中查找多个字符串 - Find multiple string from a given string using regex 给定字符串的正则表达式 - Regex expression for a given string 如何使用 python 正则表达式在给定字符串中查找所有完全匹配项 - How to find all the exact matches in a given string using python Regex python使用正则表达式查找给定字符串中多个字符串的所有匹配项? - python find all matches of multiple strings in a given string using regex? 无法使用正则表达式从给定字符串中抓取某些字段 - Unable scrape certain fields from a given string using regex 给定字符串的正则表达式模式匹配 - Regex pattern match for a given string 给定字符串的开始和结束部分时,如何使用正则表达式提取字符串的一部分 - How to extract a part of a string using regex, when starting and ending portion of the string is given 使用正则表达式从给定的单词开始直到字符串的末尾(包括换行符)获取字符串的一部分 - Fetching a part of a string using regex starting from a given word untill the end of the string(one that includes newlines)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM