简体   繁体   English

用python索引熊猫数据框内的熊猫数据框

[英]Indexing pandas dataframes inside pandas dataframes with python

I have a series of dataframes inside a dataframe. 我在数据框内有一系列数据框。

The top level dataframe is structured like this : 顶层数据框的结构如下:

    24hr   48hr   72hr
D1  x      x      x
D2  x      x      x 
D3  x      x      x

In each case x is a dataframe created with pandas.read_excel() 在每种情况下,x都是使用pandas.read_excel()创建的数据pandas.read_excel()

One of the columns in each x dataframe has the title 'Average Vessels Length' and there are three entries (ie rows, indices) in that column. 每个x数据框中的一列标题为“平均容器长度”,并且该列中有三个条目(即行,索引)。

What I want to return is the mean value for the column 'Average Vessels Length'. 我要返回的是“平均船长”列的平均值。 I'm also interested in how to return an particular cell in that column. 我也对如何返回该列中的特定单元格感兴趣。 I know there's a .mean method for pandas dataframes, but I can't figure out the indexing syntax to use it. 我知道有一种用于熊猫数据帧的.mean方法,但是我无法弄清楚使用它的索引语法。

Below is an example 下面是一个例子

import pandas as pd

a = {'Image name' : ['Image 1', 'Image 2', 'Image 3'], 'threshold' : [20, 25, 30], 'Average Vessels Length' : [14.2, 22.6, 15.7] }
b = pd.DataFrame(a, columns=['Image name', 'threshold', 'Average Vessels Length'])

c = pd.DataFrame(index=['D1','D2','D3'], columns=['24hr','48hr','72hr'])
c['24hr']['D1'] = a
c['48hr']['D1'] = a
c['72hr']['D1'] = a
c['24hr']['D2'] = a
c['48hr']['D2'] = a
c['72hr']['D2'] = a
c['24hr']['D3'] = a
c['48hr']['D3'] = a
c['72hr']['D3'] = a

This returns the mean of the values in the column 'Average Vessels Length' : 这将返回“平均容器长度”列中的值的平均值:

print b['Average Vessels Length'].mean()

This returns all the values in 24hr, D1, 'Average Vessels Length' 这将返回24小时,D1,“平均船只长度”中的所有值

print c['24hr']['D1']['Average Vessels Length']

This doesn't work : 这不起作用:

print c['24hr']['D1']['Average Vessels Length'].mean()

And I can't figure out how to access any particular value in c['24hr']['D1']['Average Vessels Length'] 而且我不知道如何访问c ['24hr'] ['D1'] ['平均船只长度']中的任何特定值

Ultimately I want to take the mean from each column of Dx['Average Vessels Length'].mean() and divide it by the corresponding D1['Average Vessels Length'].mean() 最终,我想从Dx ['Average Vessels Length']。mean()的每一列中取平均值,然后将其除以相应的D1 ['Average Vessels Length']。mean()

Any help would be greatly appreciated. 任何帮助将不胜感激。

I'm assuming that since you said each element of your big dataframe was a dataframe, your example data should have been: 我假设既然您说大数据框的每个元素都是一个数据框,那么示例数据应该是:

import pandas as pd

a = {'Image name' : ['Image 1', 'Image 2', 'Image 3'], 'threshold' : [20, 25, 30], 'Average Vessels Length' : [14.2, 22.6, 15.7] }
b = pd.DataFrame(a, columns=['Image name', 'threshold', 'Average Vessels Length'])

c = pd.DataFrame(index=['D1','D2','D3'], columns=['24hr','48hr','72hr'])
c['24hr']['D1'] = b
c['48hr']['D1'] = b
c['72hr']['D1'] = b
c['24hr']['D2'] = b
c['48hr']['D2'] = b
c['72hr']['D2'] = b
c['24hr']['D3'] = b
c['48hr']['D3'] = b
c['72hr']['D3'] = b

To get the mean of each individual cell you can use applymap , which maps a function to each cell of the DataFrame: 要获取每个单元格的均值,可以使用applymap ,它将函数映射到DataFrame的每个单元格:

cell_means = c.applymap(lambda e: e['Average Vessels Length'].mean())
cell_means
Out[14]: 
    24hr  48hr  72hr
D1  17.5  17.5  17.5
D2  17.5  17.5  17.5
D3  17.5  17.5  17.5

And once you have those yo can get the column means etc. and go on to normalize by the mean: 一旦有了这些哟,就可以得到列均值等,然后继续以均值归一化:

col_means = cell_means.mean(axis=0)
col_means
Out[11]: 
24hr    17.5
48hr    17.5
72hr    17.5
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM