简体   繁体   English

使用python在CSV文件的列中查找最常用的值

[英]Using python to find the most common value(s) in the column of CSV file

for each in column_names:
    print each + ':'
    for L in range(1,len(row_list)):
        each_column = columns[each][L]
        for i in each_column:
            if i == i.index(i)+1:
                count+=1
                mode=i

The above code is my attempt to find the most common values in the column of a csv file. 上面的代码是我尝试在csv文件的列中查找最常用的值。 The code is incomplete and I've been stuck for hours to get this right. 代码不完整,为了解决这个问题,我被困了几个小时。

I'm very new to python, even the syntaxes are unfamiliar to me. 我对python还是很陌生,即使语法我也不熟悉。 All help will be definitely appreciated. 所有帮助将不胜感激。

This code will do the trick 这段代码可以解决问题

  import csv
  from collections import Counter
  filename='test.csv'
  with open(filename, 'r') as f:
      column = (row[0] for row in csv.reader(f))
      print("Most frequent value: {0}".format(Counter(column).most_common()[0][0]))

First, it opens your file, then it creates a generator expression to retrieve the first column in your csv. 首先,它打开您的文件,然后创建一个生成器表达式以检索csv中的第一列。

The reason for not making it a list is that the csv can be very long, which could cause memory problems. 未将其列出的原因是csv可能很长,这可能会导致内存问题。 A generator 'generates' the item when you use it. 使用时,生成器会“生成”该项目。

Then it uses a collections.Counter object to count the most common values in the list and takes the value of the first element. 然后,它使用collections.Counter对象对列表中最常见的值进行计数,并获取第一个元素的值。 You can try the code step by step to see the output of every step. 您可以逐步尝试代码以查看每个步骤的输出。

You're only using one count variable when you actually need to count each value separately. 当您实际上需要分别对每个值进行计数时,只使用一个count变量。 So while your overall approach was quite good, iterating through the dictionary you appear to have populated from the CSV file, you need to set up another dictionary to hold the counts for each value. 因此,尽管您的总体方法相当不错,但要遍历似乎是从CSV文件填充的字典,但您需要设置另一个字典来保存每个值的计数。 Since you can't use any of the nice methods from collections.Counter or collections.defaultdict , you could do something like 由于您不能使用collections.Countercollections.defaultdict任何一种不错的方法,因此您可以执行以下操作

counts = {}
for each in column_names:
    count = {}
    print each + ':'
    for row in columns[each]:
        count[row] = count.get(row,0) + 1
    counts[each] = count

After that, you'll have a dictionary counts with one entry per column_name , containing all the values in that column as keys and their count as values. 之后,您将获得一个字典counts ,每个column_name都有一个条目,其中包含该列中的所有值作为键,其计数作为值。 Now you just need to sort those by values and output the n most common ones. 现在,您只需要按值对它们进行排序并输出n个最常见的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM