简体   繁体   中英

Using python to find the most common value(s) in the column of CSV file

for each in column_names:
    print each + ':'
    for L in range(1,len(row_list)):
        each_column = columns[each][L]
        for i in each_column:
            if i == i.index(i)+1:
                count+=1
                mode=i

The above code is my attempt to find the most common values in the column of a csv file. The code is incomplete and I've been stuck for hours to get this right.

I'm very new to python, even the syntaxes are unfamiliar to me. All help will be definitely appreciated.

This code will do the trick

  import csv
  from collections import Counter
  filename='test.csv'
  with open(filename, 'r') as f:
      column = (row[0] for row in csv.reader(f))
      print("Most frequent value: {0}".format(Counter(column).most_common()[0][0]))

First, it opens your file, then it creates a generator expression to retrieve the first column in your csv.

The reason for not making it a list is that the csv can be very long, which could cause memory problems. A generator 'generates' the item when you use it.

Then it uses a collections.Counter object to count the most common values in the list and takes the value of the first element. You can try the code step by step to see the output of every step.

You're only using one count variable when you actually need to count each value separately. So while your overall approach was quite good, iterating through the dictionary you appear to have populated from the CSV file, you need to set up another dictionary to hold the counts for each value. Since you can't use any of the nice methods from collections.Counter or collections.defaultdict , you could do something like

counts = {}
for each in column_names:
    count = {}
    print each + ':'
    for row in columns[each]:
        count[row] = count.get(row,0) + 1
    counts[each] = count

After that, you'll have a dictionary counts with one entry per column_name , containing all the values in that column as keys and their count as values. Now you just need to sort those by values and output the n most common ones.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM