简体   繁体   English

将句子放入列表-python

[英]put sentences into list - python

I understand that nltk can split sentences and print it out using the following code. 我了解nltk可以拆分句子并使用以下代码将其打印出来。 but how do i put the sentences into a list instead of outputing onto the screen? 但是如何将句子放入列表而不是输出到屏幕上?

import nltk.data
from nltk.tokenize import sent_tokenize
import os, sys, re, glob
cwd = './extract_en' #os.getcwd()
for infile in glob.glob(os.path.join(cwd, 'fileX.txt')):
    (PATH, FILENAME) = os.path.split(infile)
    read = open(infile)
    for line in read:
        sent_tokenize(line)

the sent_tokenize(line) prints it out. send_tokenize(line)将其打印出来。 how do i put it into a list? 如何将其放入列表?

Here's a simplified version that I used to test the code: 这是我用来测试代码的简化版本:

import nltk.data
from nltk.tokenize import sent_tokenize
import sys
infile = open(sys.argv[1])
slist = []
for line in infile:
    slist.append(sent_tokenize(line))
print slist
infile.close()

When called like so, it prints the following: 当这样调用时,它会打印以下内容:

me@mine:~/src/ $ python nltkplay.py nltkplay.py 
[['import nltk.data\n'], ['from nltk.tokenize import sent_tokenize\n'], ['import sys\n'], ['infile = open(sys.argv[1])\n'], ['slist = []\n'], ['for line in infile:\n'], ['    slist.append(sent_tokenize(line))\n'], ['print slist\n'], ['\n']]

When doing something like this, a list comprehension is more concise and IMO more pleasant to read: 当执行这样的操作时,列表理解更加简洁,IMO更易于阅读:

slist = [sent_tokenize(line) for line in infile]

To clarify, the above returns a list of lists of sentences, one list of sentences for each line. 为了清楚起见,以上代码返回了一个句子列表列表,每行一个句子列表。 If you want a flat list of sentences, do this instead, as eyquem suggests: 如果您想要一个简单的句子列表,请改用eyquem建议的方法:

slist = sent_tokenize(infile.read())

You must not use a keyword name (read) to name an object of your programm. 您不得使用关键字名称(读取)来命名程序对象。

.

If you want to append in a list, you must have a list: 如果要追加到列表中,则必须具有一个列表:

reclist = []
for line in f:
    reclist.append(line)

or with a list comprehension 或具有列表理解

reclist = [ line for line in f ]

or using the tools of Python 或使用Python的工具

reclist = f.readlines()

or I didn't understand what you want 或者我不明白你想要什么

EDIT: 编辑:

Well, considering the Jochen Ritzel 's remark, you want 好吧,考虑到Jochen Ritzel的讲话,您想要

f = open(infile)
reclist = sent_tokenise(f.read())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM