简体   繁体   English

将元组列表的列表转换为python中的列表列表的元组

[英]Convert a list of lists of tuples to a tuple of lists of lists in python

I am writing a program for NER tagging with nltk and Mallet. 我正在编写一个使用nltk和Mallet进行NER标签的程序。 I have to convert between two formats of input data that I cannot change. 我必须在无法更改的两种输入数据格式之间进行转换。

The data basically contains words with their associated tags for supervised learning, but there's a subdivision of the data into sentences, hence the list of lists. 数据基本上包含带有相关标签的单词,以进行监督学习,但是将数据细分为句子,因此也列出了列表。

The first format is 第一种格式是

tuple(list(list(word)),list(list(tag))) 

and the second format is 第二种格式是

list(list(tuple(word,tag))

Currently I am converting it like this (format 2 => format 1): 目前,我正在像这样转换它(格式2 =>格式1):

([[tup[0] for tup in sent] for sent in train_set],
 [[tup[1] for tup in sent] for sent in train_set])

Sample data: 样本数据:

 [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],[('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]

and expected output: 和预期的输出:

 ([['Steve', 'runs', 'Apple' ],['Today','is','June','27th']],
  [['PERSON','NONE','ORGANIZATION'],['NONE','NONE','DATETIME','DATETIME']])

I perform conversion in both directions 我双向执行转换

EDIT: I don't necessarily want it to be shorter - please just suggest a better (and more readable) way of doing it in python 2.7 (with code sample). 编辑:我并不一定希望它更短-请在python 2.7(带有代码示例)中提出一种更好的方法(并且更具可读性)。

Converting list(list(tuple(word,tag)) to tuple(list(list(word)),list(list(tag))) is easy: list(list(tuple(word,tag))tuple(list(list(word)),list(list(tag)))很容易:

def convert(data_structure):
     sentences, tags = data_structure
     container = []
     for i in xrange(len(sentences)):
         container.append(zip(sentences[i], tags[i]))

     return container

The code for converting into the other direction is a bit longer but not very complicated if you simply use nested for loops: 如果仅使用嵌套的for循环,则用于转换为另一个方向的代码会更长一些,但不会非常复杂:

def convert(data_structure):
    sentences = []
    tags = []

    for sentence in data_structure:
        sentence_words = []
        sentence_tags = []

        for word, tag in sentence:
            sentence_words.append(word)
            sentence_tags.append(tag)

        sentences.append(sentence_words)
        tags.append(sentence_tags)

    return (sentences, tags)

Perhaps the code can be shortened more but the general principle should be clear, hopefully. 也许可以进一步缩短代码,但希望总的原则应该清楚。

You can convert the inner tuples to iterators(using iter ) and then call next on them in a nested list comprehension: 您可以将内部的元组转换为迭代器(使用iter ),然后调用next对他们在嵌套列表理解:

lis = [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],
       [('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]

it = [[iter(y) for y in x] for x in lis]
n = len(lis[0][0])  #Number of iterations required.
print [[[next(x) for x in i] for i in it] for _ in range(n)]

Output: 输出:

[[['Steve', 'runs', 'Apple'], ['Today', 'is', 'June', '27th']],
 [['PERSON', 'NONE', 'ORGANIZATION'], ['NONE', 'NONE', 'DATETIME', 'DATETIME']]]

I think the correct solution will be this one: 我认为正确的解决方案是:

>>> data = [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],[('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]
>>> tuple([ map(list, (zip(*x))) for x in data ])
([['Steve', 'runs', 'Apple'], ['PERSON', 'NONE', 'ORGANIZATION']], [['Today', 'is', 'June', '27th'], ['NONE', 'NONE', 'DATETIME', 'DATETIME']])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM