每次元素更改时如何获取新列表（元素是元组列表中每个元组的特定索引）

Question

I have a list filled with lists like this one: ['L1045', 'u0', 'm0', 'BIANCA', 'They do not!'] and this one ['L1981', 'u16', 'm1', 'COLUMBUS', "I haven't given you much of a life."] parsed from the Cornell Movie Dialog Corpus, where the index 0 is the dialogue line ID, index 2 is the movie ID, and index 3 is the line itself. 我有一个列表，里面有这样的列表：['L1045'，'u0'，'m0'，'BIANCA'，'他们没有！']这一个['L1981'，'u16'，'m1' ，'COLUMBUS'，“我没有给你很多生命。”]从康奈尔电影对话语料库解析，其中索引0是对话行ID，索引2是电影ID，索引3是行本身。 There are many lines from each movie, so many lists have identical items at index 2 (many 'm0's for example). 每部电影都有很多行，因此许多列表在索引2处具有相同的项目（例如，很多'm0'）。 They do not have every line in each movie, though, so the items at index 0 may fall into some patterns, but other numbers are absent (for example, there might be an 'L99,' 'L100,' 'L102' for a particular movie, but then there may be a gap from 103-179). 但是，它们在每部电影中都没有每一行，因此索引0处的项目可能会落入某些模式，但其他数字不存在（例如，可能存在“L99”，“L100”，“L102”表示特别是电影，但之后可能会有103-179的差距）。

Basically, I'm trying to create a separate list of strings of each index 3 for all the sequential lines in each movie. 基本上，我正在尝试为每个电影中的所有连续行创建每个索引3的单独字符串列表。 So a separate list of lines for each separate "scene" for each movie. 因此，每个电影的每个单独“场景”的单独行列表。

I'm just having a very hard time getting there. 我只是很难到达那里。 I don't know if I should be creating a dictionary where each unique movie (index 2) has a unique key with a value consisting of a list of tuples, each with the line number and the line itself. 我不知道我是否应该创建一个字典，其中每个独特的电影（索引2）都有一个唯一的键，其值由一个元组列表组成，每个元组都有行号和行本身。 Then doing some kind of counter to check whether there is a gap in the line numbers, etc, etc). 然后做某种计数器来检查行号等是否有间隙等）。 If I go this route, I'm struggling even figuring out how to do this for each specific movie... 如果我走这条路，我甚至都在努力想办法为每部特定的电影做些什么......

Any help would be tremendously appreciated! 任何帮助将非常感谢！

Below is some code I know doesn't work but shows some of my initial thought processes: 下面是一些我知道不起作用的代码，但展示了我最初的一些思考过程：

movie_lines = 'DIRECTORYPATH/movie_lines.txt'
with open(movie_lines, "r", encoding="ISO-8859-1") as fh:
    lines_chunks = [line.split(" +++$+++ ") for line in fh]

number = 0
counter = 'm' + str(number)
new_list = []

for i in range(616):  
    number = 0
    counter = 'm' + str(number)

    for line in lines_chunks:
        if line[2] == counter:
            new_list.append([(line[2], line[0], line[4])])
        number += 1

Answer 1

Here's my approach: 这是我的方法：

I'd use a nested dictionary to store data: 我使用嵌套字典来存储数据：

data = {'movie_id' : {'scene_id' : tuple(int(line_id), character, actual_line)}}

This way if you want to retrieve all lines from a particular scene in a particular movie, you'll just need to call data['movie']['scene'] and the return is a list of tuples. 这样，如果你想从特定电影中的特定场景中检索所有行，你只需要调用data['movie']['scene'] ，返回是一个元组列表。

Here's the code: 这是代码：

movie_lines = 'movie_lines.txt'
with open(movie_lines, "r") as f:
    lines = [line.split(' +++$+++ ') for line in f]

data = dict()

for line in lines:
    # line[0] --> line_id
    # line[1] --> scene_id
    # line[2] --> movie_id
    # line[3] --> character???
    # line[4] --> actual_line
    if not line[2] in data:
        data[line[2]] = {line[1]: [(int(line[0][1:]),line[3],line[4])]}
    elif not line[1] in data[line[2]]:
        data[line[2]][line[1]] = [(int(line[0][1:]),line[3],line[4])]
    else:
        data[line[2]][line[1]].append((int(line[0][1:]), line[3], line[4]))

# taking movie 'm0' and scene 'u0' as an example
test = data['m0']['u0']
test.sort()  # by default sort is done by first element in tuple
print(test)

int(line[0][1:]) converts the line id "Lxxx" to an integer for ease of sorting later. int(line[0][1:])将行id“Lxxx”转换为整数，以便以后进行排序。

Output: 输出：

[(49, 'BIANCA', 'Did you change your hair?\\n'), (51, 'BIANCA', 'You might wanna think about it\\n'), (165, 'BIANCA', 'Nowhere... Hi, Daddy.\\n'), (179, 'BIANCA', "Now don't get upset. Daddy, but there's this boy... and I think he might ask...\\n"), ..., (1021, 'BIANCA', 'Is that woman a complete fruit-loop or is it just me?\\n'), (1045, 'BIANCA', 'They do not!\\n'), (1051, 'BIANCA', 'Patrick -- is that- a.\\n')] [（49，'BIANCA'，'你有没有改变你的头发？\\ n'），（51，'BIANCA'，'你可能想要考虑它\\ n'），（165，'BIANCA'，'无处......嗨，爸爸。\\ n'），（179，'BIANCA'，“现在不要生气。爸爸，但是这个男孩......我想他可能会问...... \\ n”），... 。，（1021，'BIANCA'，'那个女人是完整的水果圈还是仅仅是我？\\ n'），（1045，'BIANCA'，'他们没有！\\ n'），（1051，' BIANCA'，'Patrick - 就是那个。\\ n'）]

Hope this could help you. 希望这可以帮到你。 Cheers. 干杯。

每次元素更改时如何获取新列表（元素是元组列表中每个元组的特定索引）

问题描述

1 个解决方案

解决方案1
0 2019-06-15 15:40:17

每次元素更改时如何获取新列表（元素是元组列表中每个元组的特定索引）

问题描述

1 个解决方案

解决方案1 0 2019-06-15 15:40:17

解决方案1
0 2019-06-15 15:40:17