[英]Move index values into column names in pandas Data Frame
I'm trying to reshape a multi-indexed data frame so that the values from the second level of the index are incorporated into the column names in the new data frame.我正在尝试重塑多索引数据框,以便将第二级索引中的值合并到新数据框中的列名中。 In the data frame below, I want to move A and B from "source" into the columns so that I have s1_A, s1_B, s2_A, ..., s3_B.在下面的数据框中,我想将 A 和 B 从“源”移动到列中,以便我有 s1_A、s1_B、s2_A、...、s3_B。
I've tried creating the structure of the new data frame explicitly and populating it with a nested for loop to reassign the values, but it is excruciatingly slow.我尝试过显式创建新数据框的结构,并使用嵌套的 for 循环填充它以重新分配值,但速度非常慢。 I've tried a number of functions from the pandas API, but without much luck.我已经尝试了 Pandas API 中的许多功能,但运气不佳。 Any help would be much appreciated.任何帮助将非常感激。
midx = pd.MultiIndex.from_product( [[1,2,3], ['A','B']], names=["sample","source"])
df = pd.DataFrame( index=midx, columns=['s1', 's2', 's3'], data=np.ndarray(shape=(6,3)) )
>>> df
s1 s2 s3
sample source
1 A 1.2 3.4 5.6
B 1.2 3.4 5.6
2 A 1.2 3.4 5.6
B 1.2 3.4 5.6
3 A 1.2 3.4 5.6
B 1.2 3.4 5.6
# Want to build a new data frame thatlooks like this:
>>> df_new
s1_A s1_B s2_A s2_B s3_A s3_B
sample
1 1.2 1.2 3.4 3.4 5.6 5.6
2 1.2 1.2 3.4 3.4 5.6 5.6
3 1.2 1.2 3.4 3.4 5.6 5.6
Here's how I'm currently doing it.这是我目前的做法。 It's extremely slow, and I know there must be a more idiomatic way to do this with pandas, but I'm still new to its API:它非常慢,我知道必须有一种更惯用的方法来使用 Pandas 来做到这一点,但我仍然不熟悉它的 API:
substances = df.columns.values
sources = ['A','B']
subst_and_src = sorted([ subst + "_" + src for src in sources for subst in substances ])
df_new = pd.DataFrame(index=df.index.unique(0), columns=subst_and_src)
# Runs forever
for (sample, source) in df.index:
for subst in df.columns:
df_new[sample, subst + "_" + source] = df.loc[(sample,source), subst]
df = df.unstack(level=1)
df.columns = ['_'.join(col).strip() for col in df.columns.values]
print(df)
Prints:印刷:
s1_A s1_B s2_A s2_B s3_A s3_B
sample
1 4.665045e-310 6.904071e-310 0.0 0.0 6.903913e-310 2.121996e-314
2 6.904071e-310 0.000000e+00 0.0 0.0 3.458460e-323 0.000000e+00
3 0.000000e+00 0.000000e+00 0.0 0.0 0.000000e+00 0.000000e+00
Unstack
into a new dataframe and collapse multilevel index of resulting frmae using f string
Unstack
到一个新的数据帧并使用f string
折叠结果帧的多级索引
df1= df.unstack()
df1.columns = df1.columns.map('{0[0]}_{0[1]}'.format)
s1_A s1_B s2_A s2_B s3_A s3_B
sample
1 1.2 1.2 3.4 3.4 5.6 5.6
2 1.2 1.2 3.4 3.4 5.6 5.6
3 1.2 1.2 3.4 3.4 5.6 5.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.