How do I convert my 2D numpy array to a pandas dataframe with given categories?

Question

I have an array called 'values' which features 2 columns of mean reaction time data from 10 individuals. The first column refers to data collected for a single individual in condition A, the second for that same individual in condition B:

array([[451.75      , 488.55555556],
   [552.44444444, 590.40740741],
   [629.875     , 637.62962963],
   [454.66666667, 421.88888889],
   [637.16666667, 539.94444444],
   [538.83333333, 516.33333333],
   [463.83333333, 448.83333333],
   [429.2962963 , 497.16666667],
   [524.66666667, 458.83333333]])

I would like to plot these data using seaborn, to display the mean values and connected single values for each individual across the two conditions. What is the simplest way to convert the array 'values' into a 3 column DataFrame, whereby one column features all the values, another features a label distinguishing that value as condition A or condition B, and a final column which provides a number for each individual (ie, 1-10)? For example, as follows:

Value    Condition    Individual
451.75   A            1
488.56   B            1
488.55   A            2

...etc

Answer 1

This should work

import pandas as pd
import numpy as np
np_array = np.array([[451.75      , 488.55555556],
   [552.44444444, 590.40740741],
   [629.875     , 637.62962963],
   [454.66666667, 421.88888889],
   [637.16666667, 539.94444444],
   [538.83333333, 516.33333333],
   [463.83333333, 448.83333333],
   [429.2962963 , 497.16666667],
   [524.66666667, 458.83333333]])
pd_df = pd.DataFrame(np_array, columns=["A", "B"])
num_individuals = len(pd_df.index)
pd_df = pd_df.melt()
pd_df["INDIVIDUAL"] = [(i)%(num_individuals) + 1 for i in pd_df.index]
pd_df
   variable       value  INDIVIDUAL
0         A  451.750000           1
1         A  552.444444           2
2         A  629.875000           3
3         A  454.666667           4
4         A  637.166667           5
5         A  538.833333           6
6         A  463.833333           7
7         A  429.296296           8
8         A  524.666667           9
9         B  488.555556           1
10        B  590.407407           2
11        B  637.629630           3
12        B  421.888889           4
13        B  539.944444           5
14        B  516.333333           6
15        B  448.833333           7
16        B  497.166667           8
17        B  458.833333           9

Answer 2

`melt`

You can do that using pd.melt :

pd.DataFrame(data, columns=['A','B']).reset_index().melt(id_vars = 'index')\
    .rename(columns={'index':'Individual'})

 Individual variable       value
0            0        A  451.750000
1            1        A  552.444444
2            2        A  629.875000
3            3        A  454.666667
4            4        A  637.166667
5            5        A  538.833333
6            6        A  463.833333
7            7        A  429.296296
8            8        A  524.666667
9            0        B  488.555556
10           1        B  590.407407
11           2        B  637.629630
12           3        B  421.888889
13           4        B  539.944444
14           5        B  516.333333
15           6        B  448.833333
16           7        B  497.166667
17           8        B  458.833333

How do I convert my 2D numpy array to a pandas dataframe with given categories?

Question

2 answers

solution1
3 2018-11-29 14:21:41

solution2
1 ACCPTED 2018-11-29 14:28:15

`melt`

How do I convert my 2D numpy array to a pandas dataframe with given categories?

Question

2 answers

solution1 3 2018-11-29 14:21:41

solution2 1 ACCPTED 2018-11-29 14:28:15

melt

solution1
3 2018-11-29 14:21:41

solution2
1 ACCPTED 2018-11-29 14:28:15

`melt`