[英]Python 3.x - Two Word Generator from a column in csv file
I have a huge csv file that has a column with descriptions of user problems. 我有一个巨大的csv文件,其中包含一列说明用户问题的文件。 Something like 1. "Please reset my password - User name is xxxx" 2. "My phone voicemail is not working" 3. "I have a broken desk" 1.“请重设密码-用户名是xxxx” 2.“我的电话语音信箱无法使用” 3.“办公桌坏了”
I am trying to create a generator in python that reads this column and creates a generator with two words. 我正在尝试在python中创建一个生成器,该生成器读取此列并使用两个单词创建一个生成器。 So, in the above example, it should create a generator like this: ('Please reset', 'reset my', 'my password', 'password -',.... 'My phone', 'phone voicemail',... 'I have', 'have a'....) 因此,在上面的示例中,它应该创建一个类似这样的生成器:(“请重设”,“重设我的”,“我的密码”,“密码-”,....“我的电话”,“电话语音信箱”, ...“我有”,“有” ...)
Note that I am looking to create only generators, not lists, because the file is huge. 请注意,由于文件很大,我希望仅创建生成器,而不创建列表。 I can create a generator with the words ('Please', 'reset', 'my', 'password'...), but I am not able to concatenate words. 我可以用单词(“ Please”,“ reset”,“ my”,“ password” ...)创建一个生成器,但是我无法连接单词。
I am using: word = (word for row in csv.reader(f) for word in row[3].lower().split()) to create the generator with words. 我正在使用: word =(csv.reader(f)中的行中的单词,row [3] .lower()。split()中的单词)用单词创建生成器。
listofwords = [words[i]+" "+words[i+1] for i in range(len(words)-1)]
You're looking for a Rolling or sliding window iterator . 您正在寻找滚动或滑动窗口迭代器 。 The accepted answer to that question is the one below, though I suggest reading through the answers there: 尽管我建议仔细阅读下面的答案,但该问题的答案是下面的答案:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
So for every line, we can get the window iterator over that line, then use chain
to flatten them into a single iterator. 因此,对于每一行,我们都可以在该行上获得窗口迭代器,然后使用chain
将其展平为单个迭代器。
import csv
from itertools import chain
with open('file.txt') as f:
r = csv.reader(f)
descriptions = (line[3].lower().split() for line in r)
iterators = map(window, descriptions)
final = chain.from_iterable(iterators)
for item in final:
print(item)
For the file 对于文件
,,,a b c
,,,d e f
this would print 这将打印
('a', 'b')
('b', 'c')
('d', 'e')
('e', 'f')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.