[英]put sentences into list - python
I understand that nltk can split sentences and print it out using the following code. 我了解nltk可以拆分句子并使用以下代码将其打印出来。 but how do i put the sentences into a list instead of outputing onto the screen?
但是如何将句子放入列表而不是输出到屏幕上?
import nltk.data
from nltk.tokenize import sent_tokenize
import os, sys, re, glob
cwd = './extract_en' #os.getcwd()
for infile in glob.glob(os.path.join(cwd, 'fileX.txt')):
(PATH, FILENAME) = os.path.split(infile)
read = open(infile)
for line in read:
sent_tokenize(line)
the sent_tokenize(line) prints it out. send_tokenize(line)将其打印出来。 how do i put it into a list?
如何将其放入列表?
Here's a simplified version that I used to test the code: 这是我用来测试代码的简化版本:
import nltk.data
from nltk.tokenize import sent_tokenize
import sys
infile = open(sys.argv[1])
slist = []
for line in infile:
slist.append(sent_tokenize(line))
print slist
infile.close()
When called like so, it prints the following: 当这样调用时,它会打印以下内容:
me@mine:~/src/ $ python nltkplay.py nltkplay.py
[['import nltk.data\n'], ['from nltk.tokenize import sent_tokenize\n'], ['import sys\n'], ['infile = open(sys.argv[1])\n'], ['slist = []\n'], ['for line in infile:\n'], [' slist.append(sent_tokenize(line))\n'], ['print slist\n'], ['\n']]
When doing something like this, a list comprehension is more concise and IMO more pleasant to read: 当执行这样的操作时,列表理解更加简洁,IMO更易于阅读:
slist = [sent_tokenize(line) for line in infile]
To clarify, the above returns a list of lists of sentences, one list of sentences for each line. 为了清楚起见,以上代码返回了一个句子列表列表,每行一个句子列表。 If you want a flat list of sentences, do this instead, as eyquem suggests:
如果您想要一个简单的句子列表,请改用eyquem建议的方法:
slist = sent_tokenize(infile.read())
You must not use a keyword name (read) to name an object of your programm. 您不得使用关键字名称(读取)来命名程序对象。
. 。
If you want to append in a list, you must have a list: 如果要追加到列表中,则必须具有一个列表:
reclist = []
for line in f:
reclist.append(line)
or with a list comprehension 或具有列表理解
reclist = [ line for line in f ]
or using the tools of Python 或使用Python的工具
reclist = f.readlines()
or I didn't understand what you want 或者我不明白你想要什么
EDIT: 编辑:
Well, considering the Jochen Ritzel 's remark, you want 好吧,考虑到Jochen Ritzel的讲话,您想要
f = open(infile)
reclist = sent_tokenise(f.read())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.