简体   繁体   中英

Move index values into column names in pandas Data Frame

I'm trying to reshape a multi-indexed data frame so that the values from the second level of the index are incorporated into the column names in the new data frame. In the data frame below, I want to move A and B from "source" into the columns so that I have s1_A, s1_B, s2_A, ..., s3_B.

I've tried creating the structure of the new data frame explicitly and populating it with a nested for loop to reassign the values, but it is excruciatingly slow. I've tried a number of functions from the pandas API, but without much luck. Any help would be much appreciated.

midx = pd.MultiIndex.from_product( [[1,2,3], ['A','B']], names=["sample","source"])
df = pd.DataFrame( index=midx, columns=['s1', 's2', 's3'], data=np.ndarray(shape=(6,3)) )

>>> df
                s1   s2   s3
sample source               
1      A       1.2  3.4  5.6
       B       1.2  3.4  5.6
2      A       1.2  3.4  5.6
       B       1.2  3.4  5.6
3      A       1.2  3.4  5.6
       B       1.2  3.4  5.6


# Want to build a new data frame thatlooks like this:
>>> df_new
       s1_A   s1_B   s2_A   s2_B   s3_A   s3_B
sample                
1      1.2    1.2    3.4    3.4    5.6    5.6
2      1.2    1.2    3.4    3.4    5.6    5.6
3      1.2    1.2    3.4    3.4    5.6    5.6

Here's how I'm currently doing it. It's extremely slow, and I know there must be a more idiomatic way to do this with pandas, but I'm still new to its API:

substances = df.columns.values
sources = ['A','B']
subst_and_src = sorted([ subst + "_" + src for src in sources for subst in substances ])

df_new = pd.DataFrame(index=df.index.unique(0), columns=subst_and_src)

# Runs forever
for (sample, source) in df.index:
    for subst in df.columns:
        df_new[sample, subst + "_" + source] = df.loc[(sample,source), subst]
df = df.unstack(level=1)
df.columns = ['_'.join(col).strip() for col in df.columns.values]
print(df)

Prints:

                 s1_A           s1_B  s2_A  s2_B           s3_A           s3_B
sample                                                                        
1       4.665045e-310  6.904071e-310   0.0   0.0  6.903913e-310  2.121996e-314
2       6.904071e-310   0.000000e+00   0.0   0.0  3.458460e-323   0.000000e+00
3        0.000000e+00   0.000000e+00   0.0   0.0   0.000000e+00   0.000000e+00

Unstack into a new dataframe and collapse multilevel index of resulting frmae using f string

df1= df.unstack()
df1.columns = df1.columns.map('{0[0]}_{0[1]}'.format)



        s1_A  s1_B  s2_A  s2_B  s3_A  s3_B
sample                                    
1        1.2   1.2   3.4   3.4   5.6   5.6
2        1.2   1.2   3.4   3.4   5.6   5.6
3        1.2   1.2   3.4   3.4   5.6   5.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM