convert pandas df to multi-dimensional numpy array

Question

I have very sparse data in a pandas dataframe with 25million+ records. This has to be converted into a multi dimensional numpy array. I have written this the straightforward way using a for loop, and was wondering if there is a more efficient way.

import numpy as np
import pandas as pd

facts_pd = pd.DataFrame.from_records(columns=['name','offset','code'],
    data=[('John', -928, 'dx_434'), ('Steve',-757,'dx_5859'), ('Jack',-800,'dx_250'),
          ('John',-919,'dx_401'),('John',-956,'dx_5859')])

name_lu = pd.DataFrame(sorted(facts_pd['name'].unique()), columns=['name'])
name_lu["nameid"] = name_lu.index

offset_lu = pd.DataFrame(sorted(facts_pd['offset'].unique(), reverse=True), columns=['offset'])
offset_lu["offsetid"] = offset_lu.index

code_lu = pd.DataFrame(sorted(facts_pd['code'].unique()), columns=['code'])
code_lu["codeid"] = code_lu.index

facts_pd = pd.merge(pd.merge(pd.merge(facts_pd, name_lu, how="left", on="name")
    , offset_lu, how="left", on="offset"), code_lu, how="left", on="code")
facts_pd.drop(["name","offset","code"], inplace=True, axis=1)

facts_np = np.zeros((len(name_lu),len(offset_lu),len(code_lu)))
for row in facts_pd.iterrows():
    i,j,k = row[1]
    facts_np[i][j][k] = 1

Answer 1

The command you are probably looking for is dataframe.as_matrix() this will return a numpy array and not a matrix despite what the command says here is the man pages for it.

Here is another stack overflow topic on the use of it as well

Answer 2

Refurbished code:

import numpy as np
import pandas as pd

facts_pd = pd.DataFrame.from_records(columns=['name','offset','code'],
    data=[('John', -928, 'dx_434'), ('Steve',-757,'dx_5859'), ('Jack',-800,'dx_250'),
          ('John',-919,'dx_401'),('John',-956,'dx_5859')])

facts_np = facts_pd.as_matrix()

print facts_np # Displays the data frame in numpy array format.

convert pandas df to multi-dimensional numpy array

Question

2 answers

solution1
0 2017-08-21 03:56:11

solution2
-1 2017-08-21 06:10:39

convert pandas df to multi-dimensional numpy array

Question

2 answers

solution1 0 2017-08-21 03:56:11

solution2 -1 2017-08-21 06:10:39

solution1
0 2017-08-21 03:56:11

solution2
-1 2017-08-21 06:10:39