Python NLTK - 將段落標記為句子和單詞

Question

我在 a.txt 文件中有一些段落文本。 我正在嘗試將段落和 append 標記為句子和單詞列表。 我不確定自己做錯了什么，因為我設法獲得了句子，但沒有獲得單詞。 為此我一直把頭撞在牆上！

輸入：

This is sentence one,
Another sentence:
Third line.

所需的 output：

[
 ['This', 'is', 'sentence', 'one', ','],
 ['Another', 'sentence', ':'],
 ['Third', 'line', '.']
]

我的錯誤代碼和 output：

from nltk.tokenize import sent_tokenize, word_tokenize
with open('file.txt') as file:
    for line in file:
        sentences.append(sent_tokenize(line))

for line in sentences:
    words_token = [word_tokenize(i) for i in line]
    sentences_split_into_words.append(words_token)

----Result----
    [
     [['This', 'is', 'sentence', 'one', ',']],
     [['Another', 'sentence', ':']],
     [['Third', 'line', '.']]
    ]

我也嘗試過，但它返回錯誤“預期的字符串或類似字節的對象”：

for line in sentences:
    sentences_split_into_words.append(word_tokenize(line))

Answer 1

試試這個代碼

from nltk.tokenize import sent_tokenize, word_tokenize
with open('file.txt') as file:
    for line in file:
        sentences.append(sent_tokenize(line))
sentences_split_into_words = []
for line in sentences:
    words_token = [word_tokenize(i) for i in line]
    sentences_split_into_words.extend(words_token)

參考： https://www.programiz.com/python-programming/methods/list/extend

Python NLTK - 將段落標記為句子和單詞

問題描述

1 個解決方案

解決方案1
1 已采納 2022-02-04 09:07:17

Python NLTK - 將段落標記為句子和單詞

問題描述

1 個解決方案

解決方案1 1 已采納 2022-02-04 09:07:17

解決方案1
1 已采納 2022-02-04 09:07:17