[英]Multiply Python Pandas dataframes together to get product of values in column
我需要创建Python函数来实现以下帮助:
1)以3个Pandas数据帧作为输入(在第二列中包含一个索引列,以及一个关联的整数或浮点值)。 这些定义如下:
import pandas as pd
df1=pd.DataFrame([['placementA',2],['placementB',4]],columns=
['placement','value'])
df1.set_index('placement',inplace=True)
df2=pd.DataFrame([['strategyA',1],['strategyB',5],['strategyC',6]],columns=
['strategy','value'])
df2.set_index('strategy',inplace=True)
df3=pd.DataFrame([['categoryA',1.5],['categoryB',2.5]],columns=
['category','value'])
df3.set_index('category',inplace=True)
2)使用这三个数据框,创建一个新的数据框('df4'),该数据框在前3列中组织3个索引的所有可能组合;
3)在第4列中,附加来自三个源数据帧的所有关联“值”的数学乘积。 因此,该函数的DataFrame输出应类似于: https ://ibb.co/cypEY6
在此先感谢您的帮助。
科林
使用所有索引和列的product
,并通过构造函数创建DataFrame
,对于所有所有列,请使用prod
:
from itertools import product
names = ['placement','strategy','category']
mux = pd.MultiIndex.from_product([df1.index, df2.index, df3.index], names=names)
df = (pd.DataFrame(list(product(df1['value'], df2['value'], df3['value'])), index=mux)
.prod(1).reset_index(name='mult'))
print (df)
placement strategy category mult
0 placementA strategyA categoryA 3.0
1 placementA strategyA categoryB 5.0
2 placementA strategyB categoryA 15.0
3 placementA strategyB categoryB 25.0
4 placementA strategyC categoryA 18.0
5 placementA strategyC categoryB 30.0
6 placementB strategyA categoryA 6.0
7 placementB strategyA categoryB 10.0
8 placementB strategyB categoryA 30.0
9 placementB strategyB categoryB 50.0
10 placementB strategyC categoryA 36.0
11 placementB strategyC categoryB 60.0
另一种方法是multiple
通过列表理解所有的值:
import operator
import functools
from itertools import product
names = ['placement','strategy','category']
a = list(product(df1.index, df2.index, df3.index))
b = product(df1['value'], df2['value'], df3['value'])
data = [functools.reduce(operator.mul, x, 1) for x in b]
df = pd.DataFrame(a, columns=names).assign(mult=data)
print (df)
placement strategy category mult
0 placementA strategyA categoryA 3.0
1 placementA strategyA categoryB 5.0
2 placementA strategyB categoryA 15.0
3 placementA strategyB categoryB 25.0
4 placementA strategyC categoryA 18.0
5 placementA strategyC categoryB 30.0
6 placementB strategyA categoryA 6.0
7 placementB strategyA categoryB 10.0
8 placementB strategyB categoryA 30.0
9 placementB strategyB categoryB 50.0
10 placementB strategyC categoryA 36.0
11 placementB strategyC categoryB 60.0
带有DataFrames
列表的动态解决方案,每个中都必须有相同的columnname value
:
dfs = [df1, df2, df3]
names = ['placement','strategy','category']
a = list(product(*[x.index for x in dfs]))
b = list(product(*[x['value'] for x in dfs]))
data = pd.DataFrame(b).product(1)
df = pd.DataFrame(a, columns=names).assign(mult=data)
print (df)
placement strategy category mult
0 placementA strategyA categoryA 3.0
1 placementA strategyA categoryB 5.0
2 placementA strategyB categoryA 15.0
3 placementA strategyB categoryB 25.0
4 placementA strategyC categoryA 18.0
5 placementA strategyC categoryB 30.0
6 placementB strategyA categoryA 6.0
7 placementB strategyA categoryB 10.0
8 placementB strategyB categoryA 30.0
9 placementB strategyB categoryB 50.0
10 placementB strategyC categoryA 36.0
11 placementB strategyC categoryB 60.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.