简体   繁体   English

在 python 3 中使用正则表达式查找 substring 的最佳方法

[英]best way to find substring using regex in python 3

I was trying to find out the best way to find the specific substring in key value pair using re for the following:我试图找出在键值对中使用re找到特定 substring 的最佳方法:

some_string-variable_length/some_no_variable_digit/some_no1_variable_digit/some_string1/some_string2
eg: aba/101/11111/cde/xyz or aaa/111/1119/cde/xzx or ada/21111/5/cxe/yyz

here everything is variable and what I was looking for is something like below in key value pair:这里一切都是可变的,我正在寻找的是键值对中的如下内容:

`cde: 2` as there are two entries for cde

cxe: 1 as there is only one cxe

Note: everything is variable here except / .注意:这里的一切都是可变的,除了/ ie cde or cxe or some string will be there exactly after two / in each case即 cde 或 cxe 或某些字符串将恰好在两个/之后出现

input:aba/101/11111/cde/xyz/blabla
output: cde:xyz/blabla
input: aaa/111/1119/cde/xzx/blabla
output: cde:xzx/blabla
input: aahjdsga/11231/1119/gfts/sjhgdshg/blabla
output: gfts:sjhgdshg/blabla

If you notice here, my key is always the first string after 3rd / and value is always the substring after key如果您注意到这里,我的键始终是第 3 个/之后的第一个字符串,值始终是键后的 substring

Here are a couple of solutions based on your description that "key is always the first string after 3rd / and value is always the substring after key".以下是基于您的描述的几个解决方案,即“键始终是第 3 个 / 之后的第一个字符串,值始终是键后的 substring”。 The first uses str.split with a maxsplit of 4 to collect everything after the fourth / into the value.第一个使用str.splitmaxsplit为 4,将第四个/之后的所有内容收集到值中。 The second uses regex to extract the two parts:第二个使用正则表达式提取两个部分:

inp = ['aba/101/11111/cde/xyz/blabla',
        'aaa/111/1119/cde/xzx/blabla',
        'aahjdsga/11231/1119/gfts/sjhgdshg/blabla'
        ]

for s in inp:
    parts = s.split('/', 4)
    key = parts[3]
    value = parts[4]
    print(f'{key}:{value}')

import re

for s in inp:
    m = re.match(r'^(?:[^/]*/){3}([^/]*)/(.*)$', s)
    if m is not None:
        key = m.group(1)
        value = m.group(2)
        print(f'{key}:{value}')

For both pieces of code the output is对于这两段代码,output 是

cde:xyz/blabla
cde:xzx/blabla
gfts:sjhgdshg/blabla

Try (?<?\S)[^\s/]*(::/[^\s/]*){2}/([^\s/]*)试试(?<?\S)[^\s/]*(::/[^\s/]*){2}/([^\s/]*)

demo演示


Try new per commnt按评论尝试新的

(?<?\S)[^\s/]*(:?/[^\s/]*){2}/([^\s/]*)(:?/(\S*))?

demo2演示2

Others have already posted various regexes;其他人已经发布了各种正则表达式; a more broad question — is this problem best solved using a regex?一个更广泛的问题——这个问题最好用正则表达式解决吗? Depending on how the data is formatted overall, it may be better parsed using根据数据的整体格式化方式,使用它可能会更好地解析

  • the .split('/') method on the string ; 字符串上.split('/')方法; or或者
  • csv.reader(..., delimiter='/') or csv.DictReader(..., delimiter='/') in the csv module. csv.reader(..., delimiter='/')csv.DictReader(..., delimiter='/')csv模块中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM