简体   繁体   English

两个不同的结构化python列表之间的匹配

[英]matching between two different structured python lists

I use python 2.7 and I have 2 lists, one is of this shape: 我使用python 2.7,我有2个列表,其中一个具有以下形状:

t1 = [('go', 'VB'), [('like', 'IN'), [('i', 'PR')]], [('to', 'TO')], [('there', 'RB')]]

the other is in a text file stored in this format: 另一个是以这种格式存储的文本文件:

t2 = [go:VB, [like:IN, [i:PR]], [to:TO], [there:RB]]

I would like to see if ( t1 ) is matching ( t2 ) or not. 我想看看(t1)是否匹配(t2)。

A problem that I face is that the ones inside the text file without ( '' ) so they looks like variables. 我面临的一个问题是文本文件中的内容没有(''),因此它们看起来像变量。

Can you please help in finding a way of matching these two. 您能帮忙找到匹配这两者的方法吗?

def match(t1, t2):
    #check here if the nested lists match or not.   
    return True

I tried to turn ( t1 ) to string and delete ' ( ' and ' ) ' by replacing them with an empty '' then replace ' , ' with ' : ' but it has given a lot of quotation marks and I thought this is not a great idea of fixing this issue. 我试图将(t1)转换为字符串并删除'('和')',方法是将其替换为空的'',然后将','替换为':',但是它给出了很多引号,但我认为这不是解决此问题的好主意。

This answer is not using eval() which is a really insecure thing. 这个答案没有使用eval() ,这确实是不安全的。

  1. use str to convert your t1 to string. 使用strt1转换为字符串。
  2. delete all whitespaces in t1 and t1 with help of replace 借助replace删除t1t1所有空格
  3. use re 's sub for conversion of t2 . 使用resub转换t2
  4. finally, compare strings 最后,比较字符串
### 1
t1 = str(t1)

### 2
t1 = t1.replace(" ", "")
t2 = t2.replace(" ", "")

### 3
t2 = re.sub(r"(\w+):(\w+)", r"('\1','\2')", t2)

### 4
print(t1 == t2)

Edit 编辑

If you want to support tabs and newlines, you need to do this 如果要支持制表符和换行符,则需要执行此操作

### 2
t1 = "".join(t1.split())
t2 = "".join(t2.split())

A naive and simple approach - use regex substitution to transform the string from the file to a Python evaluable form, then evil eval it: 天真的和简单的方法-使用regex substitution将字符串从文件转换为Python可评估形式,然后对其进行恶意eval

import re

s2 = '[go:VB, [like:IN, [i:PR]], [to:TO], [there:RB]]'

# 'go:VB' -> '("go", "VB")'
s2_pyth = re.sub(r'(\w+):(\w+)', r'("\1", "\2")', s2)
# '[("go", "VB"), [("like", "IN"), [("i", "PR")]], [("to", "TO")], [("there", "RB")]]'

l2 = eval(s2_pyth)
# [('go', 'VB'), [('like', 'IN'), [('i', 'PR')]], [('to', 'TO')], [('there', 'RB')]]

if l1 == l2:
    # or whatever more specific comparison

I think, using eval in this context (seems to be a harmless academic NLP task) is ok. 我认为,在这种情况下使用eval (似乎是无害的NLP学术任务)是可以的。 If the tokens in your text file aren't strictly alphanumerical, you might need a smarter regex as r'\\w+' to match them, maybe sth. 如果文本文件中的标记不是严格的字母数字形式,则可能需要更智能的正则表达式作为r'\\w+'来匹配它们,也许是这样。 like r'[^\\[\\]]+' ... 就像r'[^\\[\\]]+' ...

Assuming that your structure is composed only of lists and tuples containing two strings, the following function should do what you whishes by generating your target string recursively : 假设您的结构仅由包含两个字符串的列表和元组组成,则以下函数应通过递归生成目标字符串来实现您的期望:

def format_list(l):
  res = "["
  items = []
  for item in l:
    if isinstance(item,list):
      items.append(format_list(item))
    elif isinstance(item,tuple):
      items.append(item[0] + ':' + item[1])
  res += ", ".join(items) + "]"
  return res

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM