Python - CSV - All the permutations of each row of numbers into tuples

Question

I am very much a novice at Python, but learning. I have been tasked at work to take a CSV of data (2500 rows) in the following format (as we cant figure out how to do it in Excel):

 RefNumber      Reviewer 1  Reviewer 2  Reviewer 3  Reviewer 4  Reviewer 5
  9676/2            6           3           2
  0526/4            6           3           1           5           1
  1842/1            5           3           1           5   
  2693/3            5           5           1           2   
  2515/1            6           3           1           5           3
  2987/1            4           1           3
  3841/1            4           3           1 
  3402/1            4           3           1           5

And produce a CSV with each average of all the permutations of numbers that you could get from each row (minimum of 3).

ie

3841/1 above would produce the tuple of {4,3,1}, and an average of 2.7

3402/1 above would produce the tuples of {4,3,1}, {4,3,1,5}, {3,1,5},{4,1,5} etc with an average of 2.7, 3.3, 3, 3.3 etc.

I am wracking my brain trying to think of the best way of doing this, as I also need to know of each average, how many numbers in the tuple did it contain ie {4,3,1} would produce an average of 2.7 and the count of numbers of that tuple is 3.

Essentially what I want to produce is this:

RefNumber      Avg 1     Avg 2       Avg 3       Avg 4   Avg 5
  3841/1        2.7         
  3402/1        2.7       3.3           3         3.5

But I guess to show the count of the numbers in the tuple, I could run it 9 times (there is a maximum of 12 reviews) and just have each datasheet on its own tab.

I technically also need the standard deviation of each tuple and the range of scores, but this is already going wayyyyy past my expertise so I guess I can maybe drop that or do it manually somehow.

Any idea on where to start with this?

Answer 1

You can use csv module to read through csv and extract the data and the itertools module to get all the combinations. see if its doing the job. Also I left the average values as it is but I see you are just working with 1 decimal point which you can easily get by rounding off the results. Guess you can save the result now.

from itertools import combinations as cb 
import csv
with open("test.csv") as f:
    reader=csv.reader(f)
    next(reader, None)  # skip header
    data=[filter(None,i) for i in reader]

def avgg(x):
    ll=[float(i) for i in x[1:]] #take review no and convert to float
    n=len(ll)
    avg_list=[x[0]]  #start result list with ref no.
    for i in range(3,n+1):
        for j in cb(ll,i):
            # print(j)  #see the combination
            avg_list.append(sum(j)/i)
    return avg_list

for x in data:
    print(avgg(x))

Answer 2

I upvoted the last answer, but I'd thought I show you an example that keeps everything in the DataFrame

data = """RefNumber, Reviewer 1, Reviewer 2,Reviewer 3,Reviewer 4,Reviewer 5
9676/2,6,3,2,,
0526/4,6,3,1,5,1
1842/1,5,3,1,5,
2693/3,5,5,1,2,
2515/1,6,3,1,5,3
2987/1,4,1,3,,
3841/1,4,3,1,,
3402/1,4,3,1,5,
"""

import pandas
import itertools
import StringIO
import numpy

buffer = StringIO.StringIO(data)
df = pandas.read_csv(buffer, index_col=0)

# EVERYTHING ABOVE IS MOSTLY SETUP CODE FOR THE EXAMPLE
def get_combos(items, lower_bound=3):
    """
    Return all combinations of values of size lower_bound
    for items
    """
    usable = items.dropna()
    combos = list()
    n_combos = range(lower_bound, len(usable) + 1)
    for r in n_combos:
        combos += list(itertools.combinations(usable, r))
    return combos

df['combos'] = df.apply(get_combos, axis=1)
df['means'] = df['combos'].map(lambda items: [numpy.mean(x) for x in items])

Answer 3

Related Q&A regarding subset generation:

Python - CSV - All the permutations of each row of numbers into tuples

Question

3 answers

solution1
3 ACCPTED 2017-05-08 14:28:37

solution2
1 2017-05-08 14:52:10

solution3
1 2017-05-08 15:04:14

Python - CSV - All the permutations of each row of numbers into tuples

Question

3 answers

solution1 3 ACCPTED 2017-05-08 14:28:37

solution2 1 2017-05-08 14:52:10

solution3 1 2017-05-08 15:04:14

solution1
3 ACCPTED 2017-05-08 14:28:37

solution2
1 2017-05-08 14:52:10

solution3
1 2017-05-08 15:04:14