如何将 txt 文件拆分为两个列表，然后将一个列表拆分为其标题

Question

Got this text file:得到这个文本文件：

1e.jpg#0   A dog going for a walk .
2e.jpg#1   A boy is going to swim 
3e.jpg#2   A girl is chasing the cat .
4e.jpg#3   Three people are going to a hockey game

I need to split it into two separate lists.我需要将它分成两个单独的列表。 One list for IDs and the second for the sentences.一个列表用于 ID，第二个用于句子。 This is where I need help as now I need to split the sentences list into the following:这是我需要帮助的地方，因为现在我需要将句子列表拆分为以下内容：

[["a", "dog", "going", "for", "a"...], ["a",......]]

This is how far I got这是我走了多远

path = "s.txt"

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1:])
    
print(l2)

Answer 1

You can use the same principle.您可以使用相同的原理。 The split function splits on whitespace by default.默认情况下， split function 在空格上拆分。 I also removed the : from l2.append(line.split("\t")[1:]) so that it returns a string instead of a list with one element:我还从l2.append(line.split("\t")[1:])中删除了: ，以便它返回一个字符串而不是一个包含一个元素的列表：

path = "s.txt"

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1])
    
words_list = []
for s in l2:
    words_list.append(s.split())

print(words_list)

Answer 2

If you don't care about punctuation being added to your lists, you can just split your string in your current code (assuming only one tab character occurs):如果您不关心将标点符号添加到列表中，则可以在当前代码中拆分字符串（假设仅出现一个制表符）：

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1].split())
    
print(l2)

Output: Output：

[['A', 'dog', 'going', 'for', 'a', 'walk', '.'], ['A', 'boy', 'is', 'going', 'to', 'swim'], ['A', 'girl', 'is', 'chasing', 'the', 'cat', '.'], ['Three', 'people', 'are', 'going', 'to', 'a', 'hockey', 'game']]

If you want to remove non-word elements, you can use re.split :如果要删除非单词元素，可以使用re.split ：

import re
split_pattern = re.compile(r'\W? \W?')

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    word_list = [x for x in re.split(split_pattern, line.split("\t")[1]) if x]
    l2.append(word_list)
    
print(l2)

Output: Output：

[['A', 'dog', 'going', 'for', 'a', 'walk'], ['A', 'boy', 'is', 'going', 'to', 'swim'], ['A', 'girl', 'is', 'chasing', 'the', 'cat'], ['Three', 'people', 'are', 'going', 'to', 'a', 'hockey', 'game']]

如何将 txt 文件拆分为两个列表，然后将一个列表拆分为其标题

问题描述

2 个解决方案

解决方案1
0 2021-12-01 21:25:07

解决方案2
0 2021-12-02 06:56:12

如何将 txt 文件拆分为两个列表，然后将一个列表拆分为其标题

问题描述

2 个解决方案

解决方案1 0 2021-12-01 21:25:07

解决方案2 0 2021-12-02 06:56:12

解决方案1
0 2021-12-01 21:25:07

解决方案2
0 2021-12-02 06:56:12