Find columns where values are greater than column-wise mean

Question

How to print the column headers if the row values are greater than the mean value (or median) of the column.

For Eg., df = abcd 0 12 11 13 45 1 6 13 12 23 2 5 12 6 35

the output should be 0: a, c, d. 1: a, b, c. 2: b.

Answer 1

In [22]: df.gt(df.mean()).T.agg(lambda x: df.columns[x].tolist())
Out[22]:
0    [a, c, d]
1       [b, c]
2          [d]
dtype: object

or:

In [23]: df.gt(df.mean()).T.agg(lambda x: ', '.join(df.columns[x]))
Out[23]:
0    a, c, d
1       b, c
2          d
dtype: object

Answer 2

You can try this by using pandas , I break down the steps

df=df.reset_index().melt('index')
df['MEAN']=df.groupby('variable')['value'].transform('mean')
df[df.value>df.MEAN].groupby('index').variable.apply(list)

Out[1016]: 
index
0    [a, c, d]
1       [b, c]
2          [d]
Name: variable, dtype: object

Answer 3

Use df.apply to generate a mask, which you can then iterate over and index into df.columns :

mask = df.apply(lambda x: x >  x.mean())
out = [(i, ', '.join(df.columns[x])) for i, x in mask.iterrows()]
print(out)
[(0, 'a, c, d'), (1, 'b, c'), (2, 'd')]

Answer 4

d = defaultdict(list)
v = df.values
[d[df.index[r]].append(df.columns[c])
 for r, c in zip(*np.where(v > v.mean(0)))];
dict(d)

{0: ['a', 'c', 'd'], 1: ['b', 'c'], 2: ['d']}

Find columns where values are greater than column-wise mean

Question

4 answers

solution1
3 ACCPTED 2017-08-29 21:25:05

solution2
2 2017-08-29 21:15:24

solution3
1 2017-08-29 21:14:34

solution4
1 2017-08-29 21:35:27

Find columns where values are greater than column-wise mean

Question

4 answers

solution1 3 ACCPTED 2017-08-29 21:25:05

solution2 2 2017-08-29 21:15:24

solution3 1 2017-08-29 21:14:34

solution4 1 2017-08-29 21:35:27

solution1
3 ACCPTED 2017-08-29 21:25:05

solution2
2 2017-08-29 21:15:24

solution3
1 2017-08-29 21:14:34

solution4
1 2017-08-29 21:35:27