簡體   English   中英

如何將 txt 文件拆分為兩個列表,然后將一個列表拆分為其標題

[英]how to split txt file into two lists and than split one list to its captions

得到這個文本文件:

1e.jpg#0   A dog going for a walk .
2e.jpg#1   A boy is going to swim 
3e.jpg#2   A girl is chasing the cat .
4e.jpg#3   Three people are going to a hockey game

我需要將它分成兩個單獨的列表。 一個列表用於 ID,第二個用於句子。 這是我需要幫助的地方,因為現在我需要將句子列表拆分為以下內容:

[["a", "dog", "going", "for", "a"...], ["a",......]] 

這是我走了多遠

path = "s.txt"

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1:])
    
print(l2)

您可以使用相同的原理。 默認情況下, split function 在空格上拆分。 我還從l2.append(line.split("\t")[1:])中刪除了: ,以便它返回一個字符串而不是一個包含一個元素的列表:

path = "s.txt"

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1])
    
words_list = []
for s in l2:
    words_list.append(s.split())

print(words_list)

如果您不關心將標點符號添加到列表中,則可以在當前代碼中拆分字符串(假設僅出現一個制表符):

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    l2.append(line.split("\t")[1].split())
    
print(l2)

Output:

[['A', 'dog', 'going', 'for', 'a', 'walk', '.'], ['A', 'boy', 'is', 'going', 'to', 'swim'], ['A', 'girl', 'is', 'chasing', 'the', 'cat', '.'], ['Three', 'people', 'are', 'going', 'to', 'a', 'hockey', 'game']]

如果要刪除非單詞元素,可以使用re.split

import re
split_pattern = re.compile(r'\W? \W?')

l1 = []
l2 = []
read_file=open(path, "r")
split = [line.strip() for line in read_file]
for line in split:
    l1.append(line.split("\t")[0])
    word_list = [x for x in re.split(split_pattern, line.split("\t")[1]) if x]
    l2.append(word_list)
    
print(l2)

Output:

[['A', 'dog', 'going', 'for', 'a', 'walk'], ['A', 'boy', 'is', 'going', 'to', 'swim'], ['A', 'girl', 'is', 'chasing', 'the', 'cat'], ['Three', 'people', 'are', 'going', 'to', 'a', 'hockey', 'game']]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM