简体   繁体   中英

How do I convert my 2D numpy array to a pandas dataframe with given categories?

I have an array called 'values' which features 2 columns of mean reaction time data from 10 individuals. The first column refers to data collected for a single individual in condition A, the second for that same individual in condition B:

array([[451.75      , 488.55555556],
   [552.44444444, 590.40740741],
   [629.875     , 637.62962963],
   [454.66666667, 421.88888889],
   [637.16666667, 539.94444444],
   [538.83333333, 516.33333333],
   [463.83333333, 448.83333333],
   [429.2962963 , 497.16666667],
   [524.66666667, 458.83333333]])

I would like to plot these data using seaborn, to display the mean values and connected single values for each individual across the two conditions. What is the simplest way to convert the array 'values' into a 3 column DataFrame, whereby one column features all the values, another features a label distinguishing that value as condition A or condition B, and a final column which provides a number for each individual (ie, 1-10)? For example, as follows:

Value    Condition    Individual
451.75   A            1
488.56   B            1
488.55   A            2

...etc

This should work

import pandas as pd
import numpy as np
np_array = np.array([[451.75      , 488.55555556],
   [552.44444444, 590.40740741],
   [629.875     , 637.62962963],
   [454.66666667, 421.88888889],
   [637.16666667, 539.94444444],
   [538.83333333, 516.33333333],
   [463.83333333, 448.83333333],
   [429.2962963 , 497.16666667],
   [524.66666667, 458.83333333]])
pd_df = pd.DataFrame(np_array, columns=["A", "B"])
num_individuals = len(pd_df.index)
pd_df = pd_df.melt()
pd_df["INDIVIDUAL"] = [(i)%(num_individuals) + 1 for i in pd_df.index]
pd_df
   variable       value  INDIVIDUAL
0         A  451.750000           1
1         A  552.444444           2
2         A  629.875000           3
3         A  454.666667           4
4         A  637.166667           5
5         A  538.833333           6
6         A  463.833333           7
7         A  429.296296           8
8         A  524.666667           9
9         B  488.555556           1
10        B  590.407407           2
11        B  637.629630           3
12        B  421.888889           4
13        B  539.944444           5
14        B  516.333333           6
15        B  448.833333           7
16        B  497.166667           8
17        B  458.833333           9

melt

You can do that using pd.melt :

pd.DataFrame(data, columns=['A','B']).reset_index().melt(id_vars = 'index')\
    .rename(columns={'index':'Individual'})

 Individual variable       value
0            0        A  451.750000
1            1        A  552.444444
2            2        A  629.875000
3            3        A  454.666667
4            4        A  637.166667
5            5        A  538.833333
6            6        A  463.833333
7            7        A  429.296296
8            8        A  524.666667
9            0        B  488.555556
10           1        B  590.407407
11           2        B  637.629630
12           3        B  421.888889
13           4        B  539.944444
14           5        B  516.333333
15           6        B  448.833333
16           7        B  497.166667
17           8        B  458.833333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM