[英]Python Pandas DataFrame how to Pivot
Dear amazing hackers of the world, 亲爱的世界骇客,
I'm a newbie, and can't figure out which python/pandas function can achieve the "transformation" I want. 我是新手,无法弄清楚哪个python / pandas函数可以实现我想要的“转换”。 Showing you what I have ("original") and what kind of result I want ("desired") is better than a lengthy description (I think and hope).
向您展示我所拥有的(“原始”)和我想要什么样的结果(“期望”)比冗长的描述(我认为并希望)要好。
import pandas as pd
df_orig = pd.DataFrame()
df_orig["Treatment"] = ["C", "C", "D", "D", "C", "C", "D", "D"]
df_orig["TimePoint"] = [24, 48, 24, 48, 24, 48, 24, 48]
df_orig["AN"] = ["ALF234","ALF234","ALF234","ALF234","XYK987","XYK987","XYK987","XYK987"]
df_orig["Bincode"] = [33,33,33,33,44,44,44,44]
df_orig["BC_all"] = ["33.7","33.7","33.7","33.7","44.9","44.9","44.9","44.9"]
df_orig["RIA_avg"] = [0.202562419159333,0.281521224788666, 0.182828319454333,0.294909088002333,
0.105941322218833,0.247949961707,0.1267545610749,0.159711714967666]
df_orig["sum14N_avg"] = [4120031.79121666,3742633.37033333,4659315.47073666,4345668.76408666,
26307312.1188333,24089229.9177999,35367286.7322666,34093045.3129]
df_wanted = pd.DataFrame()
df_wanted["AN"] = ["ALF234","XYK987"]
df_wanted["Bincode"] = [33,44]
df_wanted["BC_all"] = ["33.7","44.9"]
df_wanted["C_24_RIA_avg"] = [0.202562419159333, 0.105941322218833]
df_wanted["C_48_RIA_avg"] = [0.281521224788666,0.247949961707]
df_wanted["D_24_RIA_avg"] = [0.182828319454333,0.1267545610749]
df_wanted["D_48_RIA_avg"] = [0.294909088002333, 0.159711714967666]
df_wanted["C_24_sum14N_avg"] = [4120031.791, 26307312.12]
df_wanted["C_48_sum14N_avg"] = [3742633.37, 24089229.92]
df_wanted["D_24_sum14N_avg"] = [4659315.471, 35367286.73]
df_wanted["D_48_sum14N_avg"] = [4345668.764, 34093045.31]
Thank you very much for your support!! 非常感谢您的支持!!
I believe you want to pivot this using pd.pivot_table
. 我相信您想使用
pd.pivot_table
来解决这个pd.pivot_table
。 See the examples on pivot tables to understand better how this works. 请参阅数据透视表上的示例,以更好地了解其工作原理。
The following should give you what you want. 以下内容将为您提供所需的内容。
df_wanted = pd.pivot_table(
df_orig,
index=['AN', 'Bincode', 'BC_all'],
columns=['Treatment', 'Timepoint'],
values=['RIA_avg', 'sum14N_avg']
)
Note that the column names will not be transformed exactly as you stated in your output, but rather there will be a hierarchical index on both the columns and rows, which should be more convenient to work with. 请注意,列名将不会完全按照输出中的说明进行转换,而是在列和行上都有一个层次结构的索引,使用它应该更方便。
Getting rows/columns/values out from this format is possible by using .loc
: 使用
.loc
可以从这种格式获取行/列/值:
df_wanted.loc['XYK987', :]
df_wanted.loc[:, ('sum14N_avg')]
df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]
Your output is not aligned properly, so this is hard to follow. 您的输出未正确对齐,因此很难遵循。 But it looks like a job for
df.groupby('AN').mean()
or something like that. 但这似乎是
df.groupby('AN').mean()
或类似的工作。 Read the docs on Group By. 阅读分组依据上的文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.