简体   繁体   English

在python列表中切片一个字符串

[英]slice a string in python list

I want to build a regex from structures like this : 我想从这样的结构构建一个正则表达式:

    [['mirna', 'or', 'microrna'], 'or', 'lala']

...and I want to extract the left part of the 'or' recursively to build my regex. ...并且我想递归地提取“或”的左侧部分以构建我的正则表达式。 As you can see, sometimes it is another embed list, sometimes it is a string. 如您所见,有时是另一个嵌入列表,有时是字符串。

My regex should look like : 我的正则表达式应如下所示:

((mirna|microrna)|lala)

So this is my algorithm (recursive because I never know how deep is my structure) : 这就是我的算法(递归,因为我永远不知道我的结构有多深):

def _buildRegex(self,  request):
  if not isinstance(request,  str):
    print(request)
    print('request not a str')
    request = request[0]
  for i, e in enumerate(request):
    self._print(i)
    self._print(e)
    if e == 'or':
      self._print('OR found')
      if isinstance(request,  str):
        print('left is str')
        left = request
      else:
        print('left is list')
        left = request[0:i]

      if isinstance(request,  str):
        print('right is str')
        right = request
      else:
        print('right is list')
        right = request[i+1:len(request)-1]
      print('(')

      if isinstance(left,  list):
        self._buildRegex(left)
      else:
        print(left)
      print('|')
      if isinstance(right,  list):
        self._buildRegex(right)
      else:
        print(left)
      print(')')

And this is what I get : 这就是我得到的:

    [[['mirna', 'or', 'microrna'], 'or', 'lala']]
    request not a str
    0
    ['mirna', 'or', 'microrna']
    1
    or
    OR found
    left is list
    right is list
    (
    [['mirna', 'or', 'microrna']]
    request not a str
    0
    mirna
    1
    or
    OR found
    left is list
    right is list
    (
    ['mirna']
    request not a str
    0
    m
    1
    i
    2
    r
    3
    n
    4
    a
    |
    []
    request not a str

I guess when I extract the single word the slice transform it into a list. 我猜想当我提取单个单词时,切片会将其转换为列表。 But how can I differenciate a final word from a list ? 但是,如何区分列表中的最后一个词呢? I have spend many hours and can't found a solution, I am totally lost. 我已经花了很多时间,却找不到解决方案,我完全迷失了。

I think your code has quite a few problems (such as not needing the outer wrapping list and splitting strings into lists), so I've rewritten it here. 我认为您的代码有很多问题(例如不需要外部包装列表并将字符串拆分为列表),因此我在此处进行了重写。 You just need to recurse on lists, append '|' 您只需要在列表上递归,附加'|' for 'or', and append the string for all other cases. 代表“或”,并为所有其他情况附加字符串。

def buildRegex(request):
    result = '('
    for x in request:
        if not isinstance(x, str):
            result += buildRegex(x)
        elif x == 'or':
            result += '|'
        else:
            result += x

    result += ')'
    return result

inp = [['mirna', 'or', 'microrna'], 'or', 'lala']
print(buildRegex(inp))
inp = [['mirna', 'or', ['hello', 'or', 'microrna']], 'or', ['lala', 'or','lele']]
print(buildRegex(inp))

Outputs: 输出:

((mirna|microrna)|lala)
((mirna|(hello|microrna))|(lala|lele))

Edit: Here's a version with list comprehension just for fun. 编辑:这是一个带有列表理解的版本,只是为了好玩。 It's less readable in my opinion though: 我认为它不太可读:

def buildRegex(request):
    return '(' + ''.join([buildRegex(x) if isinstance(x, list) else '|' if x == 'or' else x for x in request]) + ')'

Edit: As Francisco pointed out (not sure why he deleted his comment), it might be a good idea to replace result += x with result += re.escape(x) so that you can use characters like '|' 编辑:正如Francisco指出的(不确定他为什么删除他的评论),将result += x替换为result += re.escape(x)可能是一个好主意,以便您可以使用'|'之类的字符 directly in your strings. 直接在您的字符串中。

This appears to be working for me 这似乎为我工作

def list_to_regex(input, final=''):
    if isinstance(input, list):
        if all([isinstance(x,str) for x in input]):
            # pure list found
            y = ''.join(['|' if z == 'or' else z for z in input])
            to_add = '(' + y + ')'
            return to_add
        else:
            # mixed list
            for el in input:
                final += list_to_regex(el, final)
            return '(' + final + ')'
    else:
        # just a string
        if input == 'or':
            return '|'
        else:
            return input

Sample Usage: 样品用法:

l = [['mirna', 'or', ['hello', 'or', 'microrna']], 'or', ['lala', 'or','lele']]
# ((mirna|(hello|microrna))|(lala|lele))

This is kind of cheesy and I can already think of fringe cases. 这有点俗气,我已经想到了附带情况。 If you think about it your nested list is already basically in the format you want, so just make it a string and do some replacements. 如果考虑一下,嵌套列表已经基本上是所需的格式,那么只需将其设为字符串并进行一些替换即可。

CODE: 码:

data = [['mirna', 'or', 'microrna'], 'or', 'lala']
my_regex = str(data).replace(' ','').replace('[','(').replace(']',')').replace(",'or',",'|').replace("'",'').replace('"','')
print('my_regex='+my_regex)

It also works with the second test case from @Millie (thanks for making that!) 它也可以与@Millie的第二个测试用例一起使用(感谢这样做!)

OUTPUT: 输出:

my_regex=((mirna|microrna)|lala)

Here's the code that works for me, with error checking: 这是适用于我的代码,带有错误检查功能:

def build_regex(req):
    if (type(req) != list and type(req) != str):
        print('Error: Incompatible types')
        return -1
    if type(req) == list and len(req) % 2 != 1:
        print("Even length, missing an or somewhere")
        return -1

    if type(req) == str:
        return req
    if len(req) == 1:
        return build_regex(req[0])
    if type(req[0]) == list:
        return '(' + build_regex(req[0]) + '|' + build_regex(req[2:]) + ')'
    if type(req[0]) == str:
        return '(' + req[0] + '|' + build_regex(req[2:]) + ')'

    print("Error: Incompatible element types.")
    print("Required str or list, found " + type(req[0]))
    return -1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM