简体   繁体   English

从(row,col,values)元组列表构造pandas DataFrame

[英]Construct pandas DataFrame from list of tuples of (row,col,values)

I have a list of tuples like 我有一个元组列表,例如

data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]

and I would like to put them into a pandas DataFrame with rows named by the first column and columns named by the 2nd column. 我想将它们放入一个熊猫数据框,其中第一行命名为行,第二列命名为列。 It seems the way to take care of the row names is something like pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data]) but how do I take care of the columns to get a 2x2 matrix (the output from the previous set is 3x4)? 看来处理行名称的方法就像是pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data])但是我该如何处理列以获得2x2矩阵(前一组的输出为3x4)? Is there a more intelligent way of taking care of row labels as well, instead of explicitly omitting them? 是否还有一种更智能的方式来处理行标签,而不是显式地忽略它们?

EDIT It seems I will need 2 DataFrames - one for averages and one for standard deviations, is that correct? 编辑似乎我将需要2个DataFrame-一个用于平均值,一个用于标准差,对吗? Or can I store a list of values in each "cell"? 还是可以在每个“单元格”中存储值列表?

You can pivot your DataFrame after creating: 您可以在创建后旋转DataFrame:

>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1      c1     c2
0               
r1  avg11  avg12
r2  avg21  avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1        c1       c2
0                   
r1  stdev11  stdev12
r2  stdev21  stdev22

I submit that it is better to leave your data stacked as it is: 我认为最好按原样保留您的数据:

df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])

# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])

Then it's a bit more intuitive to say 然后说起来更直观

df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)

This way it is implicit that you're seeking to reshape the averages, or the standard deviations. 这样就隐含了您要重塑平均值或标准偏差的可能性。 Whereas, just using pivot , it's purely based on column convention as to what semantic entity it is that you are reshaping. 而仅使用pivot ,它完全基于列约定来确定要重塑的语义实体。

This is what I expected to see when I came to this question: 这是我想到这个问题时希望看到的:

#!/usr/bin/env python

import pandas as pd


df = pd.DataFrame([(1, 2, 3, 4),
                   (5, 6, 7, 8),
                   (9, 0, 1, 2),
                   (3, 4, 5, 6)],
                  columns=list('abcd'),
                  index=['India', 'France', 'England', 'Germany'])
print(df)

gives

         a  b  c  d
India    1  2  3  4
France   5  6  7  8
England  9  0  1  2
Germany  3  4  5  6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM