简体   繁体   English

将此具有 2 个元素的“元组列表列表”转换为具有 3 个元素的“元组列表”

[英]convert this “list of a list the tuple” with 2 elements to a “list of tuple” with 3 elements

I want to convert this "list of a list the tuple" with 2 elements to a "list of tuple" with 3 elements我想将这个带有 2 个元素的“元组列表”转换为带有 3 个元素的“元组列表”

[([('Yes', 'UH'),
   (',', ','),
   ('it', 'PRP'),
   ("'s", 'VBZ'),
   ('annoying', 'JJ'),
   ('and', 'CC'),
   ('cumbersome', 'JJ'),
   ('to', 'TO'),
   ('separate', 'VB'),
   ('your', 'PRP$'),
   ('rubbish', 'NN'),
   ('properly', 'RB'),
   ('all', 'PDT'),
   ('the', 'DT'),
   ('time', 'NN'),
   ('.', '.')],
  'P'),
 ([('Three', 'CD'),
   ('different', 'JJ'),
   ('bin', 'JJ'),
   ('bags', 'NNS'),
   ('stink', 'VBP'),
   ('away', 'RB'),
   ('in', 'IN'),
   ('the', 'DT'),
   ('kitchen', 'NN'),
   ('and', 'CC'),
   ('have', 'VB'),
   ('to', 'TO'),
   ('be', 'VB'),
   ('sorted', 'VBN'),
   ('into', 'IN'),
   ('different', 'JJ'),
   ('wheelie', 'NN'),
   ('bins', 'NNS'),
   ('.', '.')],
  'P'),
 ([('But', 'CC'),
   ('still', 'RB'),
   ('Germany', 'NNP'),
   ('produces', 'VBZ'),
   ('way', 'RB'),
   ('too', 'RB'),
   ('much', 'JJ'),
   ('rubbish', 'NN')],
  'P'),
 ([('and', 'CC'),
   ('too', 'RB'),
   ('many', 'JJ'),
   ('resources', 'NNS'),
   ('are', 'VBP'),
   ('lost', 'VBN'),
   ('when', 'WRB'),
   ('what', 'WP'),
   ('actually', 'RB'),
   ('should', 'MD'),
   ('be', 'VB'),
   ('separated', 'VBN'),
   ('and', 'CC'),
   ('recycled', 'VBN'),
   ('is', 'VBZ'),
   ('burnt', 'VBN'),
   ('.', '.')],
  'P'),
 ([('We', 'PRP'),
   ('Berliners', 'NNS'),
   ('should', 'MD'),
   ('take', 'VB'),
   ('the', 'DT'),
   ('chance', 'NN'),
   ('and', 'CC'),
   ('become', 'VB'),
   ('pioneers', 'NNS'),
   ('in', 'IN'),
   ('waste', 'NN'),
   ('separation', 'NN'),
   ('!', '.')],
  'C')]

To this list到这个列表

[('Yes', 'UH', 'B-P'),
   (',', ',','I-P'),
   ('it', 'PRP','I-P'),
   ("'s", 'VBZ','I-P'),
   ('annoying', 'JJ','I-P'),
   ('and', 'CC','I-P'),
   ('cumbersome', 'JJ','I-P'),
   ('to', 'TO', 'I-P'),
   ('separate', 'VB', 'I-P'),
   ('your', 'PRP$','I-P'),
   ('rubbish', 'NN','I-P'),
   ('properly', 'RB','I-P'),
   ('all', 'PDT','I-P'),
   ('the', 'DT','I-P'),
   ('time', 'NN','I-P'),
   ('.', '.','I-P')],
  
 .
 .
 .
 .
 
 ([('We', 'PRP','B-C'),
   ('Berliners', 'NNS','I-C'),
   ('should', 'MD','I-C'),
   ('take', 'VB','I-C'),
   ('the', 'DT','I-C'),
   ('chance', 'NN','I-C'),
   ('and', 'CC','I-C'),
   ('become', 'VB','I-C'),
   ('pioneers', 'NNS','I-C'),
   ('in', 'IN','I-C'),
   ('waste', 'NN','I-C'),
   ('separation', 'NN','I-C'),
   ('!', '.','I-C')]
 

as you see there,如你所见,

everywhere we have label P---> we add label BP (BEGINNING TOKEN OF LIST) and IP as 3d member of the tuple我们到处都有 label P---> 我们添加 label BP(列表的开始令牌)和 IP 作为元组的 3d 成员

everywhere we have label C---> we add a label BC (BEGINNING TOKEN OF LIST) and IP as 3d member of the tuple,,, they call this BIO tagging我们到处都有 label C---> 我们添加了一个 label BC(列表的开始令牌)和 IP 作为 3d 元组的成员,他们称此为 B 标记

https://medium.com/analytics-vidhya/bio-tagged-text-to-original-text-99b05da6664#:~:text=The%20BIO%20%2F%20IOB%20format%20(short,named%2Dentity%20recognition) . https://medium.com/analytics-vidhya/bio-tagged-text-to-original-text-99b05da6664#:~:text=The%20BIO%20%2F%20IOB%20format%20(short,named%2Dentity %20 识别)

I have tried different ways still could`not find the solution我尝试了不同的方法仍然找不到解决方案

listtoken=[]
listsent=[]
for lst in a:
    for tpl,l in zip(lst,b):
        c=(*tpl, l)
        listtoken.append(c)
    listsent.append(listtoken)

To add a single item to a tuple, you can use + with a one-element tuple (denoted as (element,) ).要将单个项目添加到元组,您可以将+与单元素元组一起使用(表示为(element,) )。 So (1,2)+(3,) => (1,2,3)所以(1,2)+(3,) => (1,2,3)

A list comprehension should do the job easily:列表理解应该很容易完成这项工作:

# with your list as L

R = [ [t+('BI'[i>0]+'-'+lb,) for i,t in enumerate(T)] for T,lb in L ]

output: output:

print(R)

[ 
  [ ('Yes', 'UH', 'B-P'), (',', ',', 'I-P'), ('it', 'PRP', 'I-P'), ("'s", 'VBZ', 'I-P'), ('annoying', 'JJ', 'I-P'), ('and', 'CC', 'I-P'), ('cumbersome', 'JJ', 'I-P'), ('to', 'TO', 'I-P'), ('separate', 'VB', 'I-P'), ('your', 'PRP$', 'I-P'), ('rubbish', 'NN', 'I-P'), ('properly', 'RB', 'I-P'), ('all', 'PDT', 'I-P'), ('the', 'DT', 'I-P'), ('time', 'NN', 'I-P'), ('.', '.', 'I-P')], 
  [ ('Three', 'CD', 'B-P'), ('different', 'JJ', 'I-P'), ('bin', 'JJ', 'I-P'), ('bags', 'NNS', 'I-P'), ('stink', 'VBP', 'I-P'), ('away', 'RB', 'I-P'), ('in', 'IN', 'I-P'), ('the', 'DT', 'I-P'), ('kitchen', 'NN', 'I-P'), ('and', 'CC', 'I-P'), ('have', 'VB', 'I-P'), ('to', 'TO', 'I-P'), ('be', 'VB', 'I-P'), ('sorted', 'VBN', 'I-P'), ('into', 'IN', 'I-P'), ('different', 'JJ', 'I-P'), ('wheelie', 'NN', 'I-P'), ('bins', 'NNS', 'I-P'), ('.', '.', 'I-P')], 
  [ ('But', 'CC', 'B-P'), ('still', 'RB', 'I-P'), ('Germany', 'NNP', 'I-P'), ('produces', 'VBZ', 'I-P'), ('way', 'RB', 'I-P'), ('too', 'RB', 'I-P'), ('much', 'JJ', 'I-P'), ('rubbish', 'NN', 'I-P')], 
  [ ('and', 'CC', 'B-P'), ('too', 'RB', 'I-P'), ('many', 'JJ', 'I-P'), ('resources', 'NNS', 'I-P'), ('are', 'VBP', 'I-P'), ('lost', 'VBN', 'I-P'), ('when', 'WRB', 'I-P'), ('what', 'WP', 'I-P'), ('actually', 'RB', 'I-P'), ('should', 'MD', 'I-P'), ('be', 'VB', 'I-P'), ('separated', 'VBN', 'I-P'), ('and', 'CC', 'I-P'), ('recycled', 'VBN', 'I-P'), ('is', 'VBZ', 'I-P'), ('burnt', 'VBN', 'I-P'), ('.', '.', 'I-P')], 
  [ ('We', 'PRP', 'B-C'), ('Berliners', 'NNS', 'I-C'), ('should', 'MD', 'I-C'), ('take', 'VB', 'I-C'), ('the', 'DT', 'I-C'), ('chance', 'NN', 'I-C'), ('and', 'CC', 'I-C'), ('become', 'VB', 'I-C'), ('pioneers', 'NNS', 'I-C'), ('in', 'IN', 'I-C'), ('waste', 'NN', 'I-C'), ('separation', 'NN', 'I-C'), ('!', '.', 'I-C')]
]

You can use a list comprehension with unpacking:您可以在解包中使用列表推导:

d = [([('Yes', 'UH'), (',', ','), ('it', 'PRP'), ("'s", 'VBZ'), ('annoying', 'JJ'), ('and', 'CC'), ('cumbersome', 'JJ'), ('to', 'TO'), ('separate', 'VB'), ('your', 'PRP$'), ('rubbish', 'NN'), ('properly', 'RB'), ('all', 'PDT'), ('the', 'DT'), ('time', 'NN'), ('.', '.')], 'P'), ([('Three', 'CD'), ('different', 'JJ'), ('bin', 'JJ'), ('bags', 'NNS'), ('stink', 'VBP'), ('away', 'RB'), ('in', 'IN'), ('the', 'DT'), ('kitchen', 'NN'), ('and', 'CC'), ('have', 'VB'), ('to', 'TO'), ('be', 'VB'), ('sorted', 'VBN'), ('into', 'IN'), ('different', 'JJ'), ('wheelie', 'NN'), ('bins', 'NNS'), ('.', '.')], 'P'), ([('But', 'CC'), ('still', 'RB'), ('Germany', 'NNP'), ('produces', 'VBZ'), ('way', 'RB'), ('too', 'RB'), ('much', 'JJ'), ('rubbish', 'NN')], 'P'), ([('and', 'CC'), ('too', 'RB'), ('many', 'JJ'), ('resources', 'NNS'), ('are', 'VBP'), ('lost', 'VBN'), ('when', 'WRB'), ('what', 'WP'), ('actually', 'RB'), ('should', 'MD'), ('be', 'VB'), ('separated', 'VBN'), ('and', 'CC'), ('recycled', 'VBN'), ('is', 'VBZ'), ('burnt', 'VBN'), ('.', '.')], 'P'), ([('We', 'PRP'), ('Berliners', 'NNS'), ('should', 'MD'), ('take', 'VB'), ('the', 'DT'), ('chance', 'NN'), ('and', 'CC'), ('become', 'VB'), ('pioneers', 'NNS'), ('in', 'IN'), ('waste', 'NN'), ('separation', 'NN'), ('!', '.')], 'C')]
new_d = [[(*a, f'B-{c}'), *[(*j, f'I-{c}') for j in b]] for [a, *b], c in d]

Output: Output:

[[('Yes', 'UH', 'B-P'), (',', ',', 'I-P'), ('it', 'PRP', 'I-P'), ("'s", 'VBZ', 'I-P'), ('annoying', 'JJ', 'I-P'), ('and', 'CC', 'I-P'), ('cumbersome', 'JJ', 'I-P'), ('to', 'TO', 'I-P'), ('separate', 'VB', 'I-P'), ('your', 'PRP$', 'I-P'), ('rubbish', 'NN', 'I-P'), ('properly', 'RB', 'I-P'), ('all', 'PDT', 'I-P'), ('the', 'DT', 'I-P'), ('time', 'NN', 'I-P'), ('.', '.', 'I-P')], [('Three', 'CD', 'B-P'), ('different', 'JJ', 'I-P'), ('bin', 'JJ', 'I-P'), ('bags', 'NNS', 'I-P'), ('stink', 'VBP', 'I-P'), ('away', 'RB', 'I-P'), ('in', 'IN', 'I-P'), ('the', 'DT', 'I-P'), ('kitchen', 'NN', 'I-P'), ('and', 'CC', 'I-P'), ('have', 'VB', 'I-P'), ('to', 'TO', 'I-P'), ('be', 'VB', 'I-P'), ('sorted', 'VBN', 'I-P'), ('into', 'IN', 'I-P'), ('different', 'JJ', 'I-P'), ('wheelie', 'NN', 'I-P'), ('bins', 'NNS', 'I-P'), ('.', '.', 'I-P')], [('But', 'CC', 'B-P'), ('still', 'RB', 'I-P'), ('Germany', 'NNP', 'I-P'), ('produces', 'VBZ', 'I-P'), ('way', 'RB', 'I-P'), ('too', 'RB', 'I-P'), ('much', 'JJ', 'I-P'), ('rubbish', 'NN', 'I-P')], [('and', 'CC', 'B-P'), ('too', 'RB', 'I-P'), ('many', 'JJ', 'I-P'), ('resources', 'NNS', 'I-P'), ('are', 'VBP', 'I-P'), ('lost', 'VBN', 'I-P'), ('when', 'WRB', 'I-P'), ('what', 'WP', 'I-P'), ('actually', 'RB', 'I-P'), ('should', 'MD', 'I-P'), ('be', 'VB', 'I-P'), ('separated', 'VBN', 'I-P'), ('and', 'CC', 'I-P'), ('recycled', 'VBN', 'I-P'), ('is', 'VBZ', 'I-P'), ('burnt', 'VBN', 'I-P'), ('.', '.', 'I-P')], [('We', 'PRP', 'B-C'), ('Berliners', 'NNS', 'I-C'), ('should', 'MD', 'I-C'), ('take', 'VB', 'I-C'), ('the', 'DT', 'I-C'), ('chance', 'NN', 'I-C'), ('and', 'CC', 'I-C'), ('become', 'VB', 'I-C'), ('pioneers', 'NNS', 'I-C'), ('in', 'IN', 'I-C'), ('waste', 'NN', 'I-C'), ('separation', 'NN', 'I-C'), ('!', '.', 'I-C')]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM