简体   繁体   中英

Python - CSV - All the permutations of each row of numbers into tuples

I am very much a novice at Python, but learning. I have been tasked at work to take a CSV of data (2500 rows) in the following format (as we cant figure out how to do it in Excel):

 RefNumber      Reviewer 1  Reviewer 2  Reviewer 3  Reviewer 4  Reviewer 5
  9676/2            6           3           2
  0526/4            6           3           1           5           1
  1842/1            5           3           1           5   
  2693/3            5           5           1           2   
  2515/1            6           3           1           5           3
  2987/1            4           1           3
  3841/1            4           3           1 
  3402/1            4           3           1           5   

And produce a CSV with each average of all the permutations of numbers that you could get from each row (minimum of 3).

ie

3841/1 above would produce the tuple of {4,3,1}, and an average of 2.7

3402/1 above would produce the tuples of {4,3,1}, {4,3,1,5}, {3,1,5},{4,1,5} etc with an average of 2.7, 3.3, 3, 3.3 etc.

I am wracking my brain trying to think of the best way of doing this, as I also need to know of each average, how many numbers in the tuple did it contain ie {4,3,1} would produce an average of 2.7 and the count of numbers of that tuple is 3.

Essentially what I want to produce is this:

RefNumber      Avg 1     Avg 2       Avg 3       Avg 4   Avg 5
  3841/1        2.7         
  3402/1        2.7       3.3           3         3.5   

But I guess to show the count of the numbers in the tuple, I could run it 9 times (there is a maximum of 12 reviews) and just have each datasheet on its own tab.

I technically also need the standard deviation of each tuple and the range of scores, but this is already going wayyyyy past my expertise so I guess I can maybe drop that or do it manually somehow.

Any idea on where to start with this?

You can use csv module to read through csv and extract the data and the itertools module to get all the combinations. see if its doing the job. Also I left the average values as it is but I see you are just working with 1 decimal point which you can easily get by rounding off the results. Guess you can save the result now.

from itertools import combinations as cb 
import csv
with open("test.csv") as f:
    reader=csv.reader(f)
    next(reader, None)  # skip header
    data=[filter(None,i) for i in reader]

def avgg(x):
    ll=[float(i) for i in x[1:]] #take review no and convert to float
    n=len(ll)
    avg_list=[x[0]]  #start result list with ref no.
    for i in range(3,n+1):
        for j in cb(ll,i):
            # print(j)  #see the combination
            avg_list.append(sum(j)/i)
    return avg_list

for x in data:
    print(avgg(x))

I upvoted the last answer, but I'd thought I show you an example that keeps everything in the DataFrame

data = """RefNumber, Reviewer 1, Reviewer 2,Reviewer 3,Reviewer 4,Reviewer 5
9676/2,6,3,2,,
0526/4,6,3,1,5,1
1842/1,5,3,1,5,
2693/3,5,5,1,2,
2515/1,6,3,1,5,3
2987/1,4,1,3,,
3841/1,4,3,1,,
3402/1,4,3,1,5,
"""

import pandas
import itertools
import StringIO
import numpy

buffer = StringIO.StringIO(data)
df = pandas.read_csv(buffer, index_col=0)

# EVERYTHING ABOVE IS MOSTLY SETUP CODE FOR THE EXAMPLE
def get_combos(items, lower_bound=3):
    """
    Return all combinations of values of size lower_bound
    for items
    """
    usable = items.dropna()
    combos = list()
    n_combos = range(lower_bound, len(usable) + 1)
    for r in n_combos:
        combos += list(itertools.combinations(usable, r))
    return combos

df['combos'] = df.apply(get_combos, axis=1)
df['means'] = df['combos'].map(lambda items: [numpy.mean(x) for x in items])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM