遍历文件并处理它

Question

So I want to have a function that can read through a file and be able to count the stuff in them. 所以我想拥有一个可以读取文件并能够计算其中内容的功能。 So far I have this: 到目前为止，我有这个：

import csv
def get_stats(train_file, valid_pfile = "cmu-phonemes.txt", valid_graphemes = 
{'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '_'}):

    invalid_row = 0
    valid_row = 0
    phonemes_count = 0
    graphemes_count = 0
    underscore_count = 0

    csv_open = open(train_file)
    reader = csv.reader(csv_open)

    with open(valid_pfile) as valid_p:
        valid_pset = set(line.strip() for line in valid_p)
    valid_gset = set(valid_graphemes)

As you might suspect I want to count out the numbers of some specific stuff. 您可能会怀疑，我想算出某些特定物品的数量。 But that is not necessarily hard. 但这并不一定难。 The thing is I cannot figure out a way to iterate through the file and do some further counting to it. 问题是我无法找出一种遍历文件并对其进行进一步计数的方法。

Here is a sample file: 这是一个示例文件：

phonemes,graphemes
W IY K D EY,w ee k d ay
T EH K S T,t e x _ t
Y UW,ewe _
SH UW T,chu te
SH UW T,chu te
SH UW T,chu te !
SX AH K,s u ck

The question is how do I iterate through the file and separate them by that "," in the middle(csv format) So I could have something like 问题是我如何遍历文件并以中间的“，”（csv格式）分隔它们，所以我可能会遇到类似

[["SH", "UW", "T"],["chu", "te"]]

Or something like this that can be used for looping and check. 或类似这样的东西可用于循环和检查。

Answer 1

file_output = []
with open(valid_pfile, 'r') as f:
    for line in f.readlines()[1:]: # ignoring the first line which is header
       file_output.append([v.split() for v in line.split(',')])

After this block the value of file_output will be: 在此块之后， file_output的值将为：

[[['W', 'IY', 'K', 'D', 'EY'], ['w', 'ee', 'k', 'd', 'ay']],
 [['T', 'EH', 'K', 'S', 'T'], ['t', 'e', 'x', '_', 't']],
 [['Y', 'UW'], ['ewe', '_']],
 [['SH', 'UW', 'T'], ['chu', 'te']],
 [['SH', 'UW', 'T'], ['chu', 'te']],
 [['SH', 'UW', 'T'], ['chu', 'te', '!']],
 [['SX', 'AH', 'K'], ['s', 'u', 'ck']]]

Now, you can use this the way you want to. 现在，您可以按照自己的方式使用它。

遍历文件并处理它

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-01 09:36:57

遍历文件并处理它

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-01 09:36:57

解决方案1
1 已采纳 2016-05-01 09:36:57