Concatenate numpy arrays to two arrays using pandas, numpy or other

Question

I have a series of numpy arrays generated for example like this:

import random
N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']
df = pd.DataFrame(data)
df = df.transpose()
df.columns = names

ie:

a    b    c    d    e
0.01 0.03 0.01 0.2  0.04
0.2  0.01 0.02 0.01 0.1
...

and I would like to format it so that it looks like this:

name    value
a       0.01
b       0.03
c       0.01
d       0.2
e       0.04
a       0.2
b       0.01
....

(order of data is not important)

I have tried pandas dataframe transpose:

df = pd.DataFrame(data)
df = df.transpose()
df.columns = names

but the result looks like this:

a    0.1   0.2  0.01 0.2
b    0.3   0.1  0.2  0.01
....

Any idea on how to reformat the numpy arrays/pandas dataframe to have two columns of data?

Answer 1

You can use numpy.tile for repeat column names and numpy.ravel for flatten values of DataFrame :

#random dataframe
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
   A  B  C  D  E
0  8  8  3  7  7
1  0  4  2  5  2
2  2  2  1  0  8
3  4  0  9  6  2
4  4  1  5  3  4

df2 = pd.DataFrame({
        "name": np.tile(df.columns, len(df.index)),
        "value": df.values.ravel()})
print (df2)        
   name  value
0     A      8
1     B      8
2     C      3
3     D      7
4     E      7
5     A      0
6     B      4
7     C      2
8     D      5
9     E      2
10    A      2
11    B      2
12    C      1
13    D      0
14    E      8
15    A      4
16    B      0
17    C      9
18    D      6
19    E      2
20    A      4
21    B      1
22    C      5
23    D      3
24    E      4

Timings ( len(df) = 1M ):

#random dataframe
np.random.seed(100)
N = 1000000
df = pd.DataFrame(np.random.randint(10, size=(N,5)), columns=list('abcde'))
print (df)

In [86]: %timeit (pd.DataFrame({"name": np.tile(df.columns, len(df.index)),"value": df.values.ravel()}))
10 loops, best of 3: 84.8 ms per loop

In [87]: %timeit (pd.DataFrame(np.column_stack((np.tile(df.columns, df.shape[0]), df.values.reshape(-1,1))), columns=['name', 'value']))
10 loops, best of 3: 196 ms per loop

In [88]: %timeit (df.stack().reset_index(level=0, drop=True).reset_index(name='value').rename(columns={'index':'name'}))
1 loop, best of 3: 344 ms per loop

If need output numpy array add numpy.column_stack :

print (np.column_stack((np.tile(df.columns, len(df.index)), df.values.ravel())))
[['a' 8]
 ['b' 8]
 ['c' 3]
 ['d' 7]
 ['e' 7]
 ['a' 0]
 ['b' 4]
 ['c' 2]
 ['d' 5]
 ['e' 2]
 ['a' 2]
 ['b' 2]
 ['c' 1]
 ['d' 0]
 ['e' 8]
 ['a' 4]
 ['b' 0]
 ['c' 9]
 ['d' 6]
 ['e' 2]
 ['a' 4]
 ['b' 1]
 ['c' 5]
 ['d' 3]
 ['e' 4]]

Answer 2

is that what you want?

In [11]: df
Out[11]:
          a         b         c         d         e
0  0.791796  0.428642  0.887860  0.803709  0.860545
1  0.230401  0.105232  0.617007  0.557678  0.590459
2  0.448462  0.314422  0.207188  0.785642  0.022271
3  0.075631  0.707029  0.111538  0.769387  0.174297
4  0.707566  0.299966  0.197642  0.145841  0.231135

In [12]: df.stack().reset_index(level=0, drop=True).reset_index()
Out[12]:
   index         0
0      a  0.791796
1      b  0.428642
2      c  0.887860
3      d  0.803709
4      e  0.860545
5      a  0.230401
6      b  0.105232
7      c  0.617007
8      d  0.557678
9      e  0.590459
10     a  0.448462
11     b  0.314422
12     c  0.207188
13     d  0.785642
14     e  0.022271
15     a  0.075631
16     b  0.707029
17     c  0.111538
18     d  0.769387
19     e  0.174297
20     a  0.707566
21     b  0.299966
22     c  0.197642
23     d  0.145841
24     e  0.231135

Answer 3

You just need to concat all the columns in df together. Since columns' name are different, you need to set them with the same name. If not, pandas will add new columns into the concat result.

import random
import pandas as pd

N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']

df = pd.DataFrame(data)
df.columns = names
df = df.transpose()
print df

#           0         1         2         3         4
# a  0.643042  0.061476  0.415979  0.209272  0.394414
# b  0.175363  0.580336  0.056173  0.468121  0.388956
# c  0.096257  0.570860  0.516667  0.892087  0.956790
# d  0.082906  0.340805  0.466074  0.010123  0.293006
# e  0.430240  0.759413  0.083779  0.442159  0.434603

df_col=[df[[i]] for i in range(len(df))]    # separate columns in df
for col in df_col:
    col.columns=['value']                   # change the columns' name

res = pd.concat(df_col)                     # concat them all together
res.index.names=['name']

print res

#          value
# name          
# a     0.643042
# b     0.175363
# c     0.096257
# d     0.082906
# e     0.430240
# a     0.061476
# b     0.580336
# c     0.570860
# d     0.340805
# e     0.759413
# a     0.415979
# b     0.056173
# c     0.516667
# d     0.466074
# e     0.083779
# a     0.209272
# b     0.468121
# c     0.892087
# d     0.010123
# e     0.442159
# a     0.394414
# b     0.388956
# c     0.956790
# d     0.293006
# e     0.434603

Concatenate numpy arrays to two arrays using pandas, numpy or other

Question

3 answers

solution1
2 ACCPTED 2016-12-03 10:06:15

solution2
1 2016-12-03 09:25:34

solution3
1 2016-12-03 10:00:24

Concatenate numpy arrays to two arrays using pandas, numpy or other

Question

3 answers

solution1 2 ACCPTED 2016-12-03 10:06:15

solution2 1 2016-12-03 09:25:34

solution3 1 2016-12-03 10:00:24

solution1
2 ACCPTED 2016-12-03 10:06:15

solution2
1 2016-12-03 09:25:34

solution3
1 2016-12-03 10:00:24