简体   繁体   English

Python 3.x-CSV文件中一列的两个Word Generator

[英]Python 3.x - Two Word Generator from a column in csv file

I have a huge csv file that has a column with descriptions of user problems. 我有一个巨大的csv文件,其中包含一列说明用户问题的文件。 Something like 1. "Please reset my password - User name is xxxx" 2. "My phone voicemail is not working" 3. "I have a broken desk" 1.“请重设密码-用户名是xxxx” 2.“我的电话语音信箱无法使用” 3.“办公桌坏了”

I am trying to create a generator in python that reads this column and creates a generator with two words. 我正在尝试在python中创建一个生成器,该生成器读取此列并使用两个单词创建一个生成器。 So, in the above example, it should create a generator like this: ('Please reset', 'reset my', 'my password', 'password -',.... 'My phone', 'phone voicemail',... 'I have', 'have a'....) 因此,在上面的示例中,它应该创建一个类似这样的生成器:(“请重设”,“重设我的”,“我的密码”,“密码-”,....“我的电话”,“电话语音信箱”, ...“我有”,“有” ...)

Note that I am looking to create only generators, not lists, because the file is huge. 请注意,由于文件很大,我希望仅创建生成器,而不创建列表。 I can create a generator with the words ('Please', 'reset', 'my', 'password'...), but I am not able to concatenate words. 我可以用单词(“ Please”,“ reset”,“ my”,“ password” ...)创建一个生成器,但是我无法连接单词。

I am using: word = (word for row in csv.reader(f) for word in row[3].lower().split()) to create the generator with words. 我正在使用: word =(csv.reader(f)中的行中的单词,row [3] .lower()。split()中的单词用单词创建生成器。

listofwords = [words[i]+" "+words[i+1] for i in range(len(words)-1)]

You're looking for a Rolling or sliding window iterator . 您正在寻找滚动或滑动窗口迭代器 The accepted answer to that question is the one below, though I suggest reading through the answers there: 尽管我建议仔细阅读下面的答案,但该问题的答案是下面的答案:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

So for every line, we can get the window iterator over that line, then use chain to flatten them into a single iterator. 因此,对于每一行,我们都可以在该行上获得窗口迭代器,然后使用chain将其展平为单个迭代器。

import csv
from itertools import chain

with open('file.txt') as f:
    r = csv.reader(f)
    descriptions = (line[3].lower().split() for line in r)
    iterators = map(window, descriptions)
    final = chain.from_iterable(iterators)
    for item in final:
        print(item)

For the file 对于文件

,,,a b c
,,,d e f

this would print 这将打印

('a', 'b')
('b', 'c')
('d', 'e')
('e', 'f')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM