简体   繁体   English

如何从python中的每一行csv文件中提取单词?

[英]how to extract words from each row csv file in python?

I have a very large .csv file (1065 row x 1 column). 我有一个非常大的.csv文件(1065行x 1列)。 Each row has sentences. 每行都有句子。 I want to pick up several important words from my wordlist (.csv file) in each row and then make data term frequency for each row. 我想从我的单词表(.csv文件)的每一行中挑选几个重要的单词,然后为每一行设定数据词频。

I have just tried to put down something, hopefully this will help you. 我刚刚尝试放下一些东西,希望对您有所帮助。 It could be done more efficiently probably, but it does the job. 可能可以更有效地完成此操作,但确实可以完成。

Example of input file 输入文件示例

bla bla bla. bla! bla bla apple!, :banana. apple!!!
banana bla bla, apple and banana
peach 12345 bla bla peach and banana, peach, banana! :apple

Code

# Your inputs
list_words = ['apple', 'banana','peach']
filename = 'example.txt'

# Set of characters to remove to tokenize the file's line
rm = ",:;?/-!."

# Read through the file per each line and do the math
with open(filename,'r') as fin:
    for count_line, line in enumerate(fin,1):
        clean_line = filter(lambda x: not (x in rm), line)
        # To hold the counts of each word
        words_frequency = {key: 0 for key in list_words}
        for w in clean_line.split():
            if w in list_words:
                words_frequency[w] += 1
        print 'Line', count_line,':', words_frequen

Output: 输出:

Line 1 : {'apple': 2, 'peach': 0, 'banana': 1}
Line 2 : {'apple': 1, 'peach': 0, 'banana': 2}
Line 3 : {'apple': 1, 'peach': 3, 'banana': 2}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM