![](/img/trans.png)
[英]How to iterate over pandas DataFrame multi-index and filtering based off another column value
[英]How do I apply a value from a dataframe based on the value of a multi-index of another dataframe?
我有以下内容:
数据框1(多索引数据框):
| Assay_A |
---------------------------------------------------
Index_A | Index_B | Index_C | mean | std | count |
---------------------------------------------------
128 12345 AAA 123 2 4
数据框2:
Index | Col_A | Col_B | Col_C | mean
-------------------------------------
1 128 12345 AAA 456
其中Col_X = a,b,c的Index_X。
我整个上午都在尝试以下操作:
我如何在数据框2中选择正确的均值(必须在Col ABC上进行匹配),以便可以对其进行数学运算。 例如,我要取数据帧1的平均值,然后将其除以正确选择的数据帧2的平均值。
理想情况下,我想将操作结果存储在新列中。 因此,最终输出应如下所示:
| Assay_A |
------------------------------------------------------------
Index_A | Index_B | Index_C | mean | std | count | result |
------------------------------------------------------------
128 12345 AAA 123 2 4 0.26
也许有一种更简单的方法可以做到这一点,我也会接受任何此类建议。
我建议您做的是1)将Dataframe 2的列重命名为Dataframe 1的索引列的相应名称,2)重置Dataframe 1的索引,并3)根据现在匹配的列名称合并两个表。 之后,您可以计算所需的任何数据。 数据帧2列上的MultiIndex增加了一些额外的开销。
明确:
import pandas as pd
# re-create table1
row_index = pd.MultiIndex.from_tuples([(128, 12345, 'AAA')])
row_index.names=['Index_A', 'Index_B', 'Index_C']
table1 = pd.DataFrame(data={'mean': 123, 'std': 2, 'count': 4}, index=row_index)
table1.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 3, table1.columns))
print "*** table 1:"
print table1
print ""
# re-create table2
table2 = pd.DataFrame([{'Col_A': 128, 'Col_B': 12345, 'Col_C': 'AAA', 'mean': 456}], index=[1])
table2.index.name = 'Index'
print "*** table 2:"
print table2
print ""
# re-name columns of table2 to match names of respective index columns in table1
table2 = table2.rename(columns={'Col_A': 'Index_A', 'Col_B': 'Index_B', 'Col_C': 'Index_C'})
# Drop 'Assay A' index level on columns of table1;
# without doing that, the following reset_index() will produce a column multi-index
# for Index_A/B/C, so column names will not match the simple column index of table2_renamed.
# If you need to keep the 'Assay A' level here, you will need to also construct a column
# multi-index for table2_renamed (with empty values for the second level).
table1.columns = table1.columns.levels[1]
# Move index columns of table1 back to regular columns
table1 = table1.reset_index()
# Merge the two tables on the now common column names. 'mean' appears in both tables,
# give the column from table2 a suffix '_2'.
joint = pd.merge(table1.reset_index(), table2, on=['Index_A', 'Index_B', 'Index_C'], suffixes={'', '_2'})
print "*** joint, before re-setting index:"
print joint
print ""
# Restore index of the joint table
joint = joint.set_index(['Index_A', 'Index_B', 'Index_C'])
# Compute the 'result'
joint['result'] = joint['mean'] / joint['mean_2']
# drop unused columns
joint = joint.drop(['index', 'mean_2'], axis=1)
# restore column index level
joint.columns = pd.MultiIndex.from_tuples(zip(['Assay A'] * 4, joint.columns))
print "*** final result:"
print joint
print ""
脚本输出为:
*** table 1:
Assay A
count mean std
Index_A Index_B Index_C
128 12345 AAA 4 123 2
*** table 2:
Col_A Col_B Col_C mean
Index
1 128 12345 AAA 456
*** joint, before re-setting index:
index Index_A Index_B Index_C count mean std mean_2
0 0 128 12345 AAA 4 123 2 456
*** final result:
Assay A
count mean std result
Index_A Index_B Index_C
128 12345 AAA 4 123 2 0.269737
希望有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.