[英]How to "multiply" dataframes with each other in Python?
I have two dataframes in Python/pandas that look as follows:我在 Python/pandas 中有两个数据框,如下所示:
df1 = df1 =
[[01/01/2001, 01/04/2004, 12/12/2007], [[2001 年 1 月 1 日,2004 年 1 月 4 日,2007 年 12 月 12 日],
[02/07/2002, NA, NA], [2002 年 2 月 7 日,北美,北美],
[04/08/2012, 02/11/2018, NA]] [2012 年 4 月 8 日,2018 年 2 月 11 日,北美]]
df2 = df2 =
[[1, 3, 2], [[1, 3, 2],
[2, NA, NA], [2, 不适用, 不适用],
[3, 1, NA]] [3, 1, 不适用]]
I would like to create a third dataframe that looks as follows:我想创建第三个 dataframe,如下所示:
df3 = df3 =
[[01/01/2001, 01/04/2004, 01/04/2004, 01/04/2004, 12/12/2007, 12/12/2007], [[2001 年 1 月 1 日,2004 年 1 月 4 日,2004 年 1 月 4 日,2004 年 1 月 4 日,2007 年 12 月 12 日,2007 年 12 月 12 日],
[02/07/2002, 02/07/2002, NA, NA, NA, NA], [2002 年 2 月 7 日,2002 年 2 月 7 日,北美,北美,北美,北美],
[04/08/2012, 04/08/2012, 04/08/2012, 02/11/2018, NA, NA]] [04/08/2012, 04/08/2012, 04/08/2012, 02/11/2018, NA, NA]]
In other words, the second df gives the number of times that I want to copy the corresponding value of the first df into the third one.换句话说,第二个 df 给出了我想将第一个 df 的相应值复制到第三个的次数。 For lack of a better word, I called this "multiplying" in the question, even though I realize that this is probably wrong.由于没有更好的词,我在问题中称其为“乘法”,尽管我意识到这可能是错误的。
Does someone know of a way to efficiently do this?有人知道有效地做到这一点的方法吗? My approach would be to work with loops and lists for each row, but I'm guessing that there should by a much more efficient way of doing this in Python.我的方法是为每一行使用循环和列表,但我猜想在 Python 中应该有一种更有效的方法来做到这一点。 Many thanks for your help and sorry again for probably using bad terminology here.非常感谢您的帮助,再次抱歉在这里可能使用了错误的术语。
Fully vectorized solution
cannot result from this logic, but we can take some benefit of numpy
and python Inbuilt
operation of list comprehension
.完全vectorized solution
不能由此逻辑产生,但我们可以从numpy
和python Inbuilt
list comprehension
的内置操作中获益。
LOGIC:逻辑:
1. Using np.repeat
which Array manipulation routines
we will use it to repeat along dataframe df1
row, where argument of repeats
of np.repeat
function will be the row of df2
object. 1.使用np.repeat
Array manipulation routines
,我们将使用它沿dataframe df1
行重复,其中np.repeat
function 的repeats
参数将是df2
ZA666CFDE63191C4BEB6 行
np.repeat(df1.iloc[i,:], df2_u.iloc[i,:].astype('i4'))
2. Important thing to look at is that repeats
arguments should be type of int
and we will use astype('i4')
which is np.int32
datatype to convert df2
row while list comprehension
. 2.要注意的重要一点是, repeats
arguments 应该是int
类型,我们将使用astype('i4')
这是np.int32
数据类型来转换df2
行而list comprehension
。
df2_u.iloc[i,:].astype('i4')
3. And lastly how to repeat np.nan
value form np.nan
for that just update df2
as df2_u
where NA
is filled with 0
using this operation: 3.最后,如何重复np.nan
值形式np.nan
只需将df2
更新为df2_u
,其中NA
使用此操作填充为0
:
df2_u = df2.fillna(0)
Generalized solution, here logic work as if we pass list of lists
with unequal-size
of nested-list
will result into DataFrame-Object
with broadcasted row
with fill
all undefined value np.nan
object.通用解决方案,这里的逻辑工作就像我们传递nested-list
list of lists
unequal-size
DataFrame-Object
具有广播row
并fill
所有未定义值np.nan
object。
CODE:代码:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([['01/01/2001', '01/04/2004', '12/12/2007'],
['02/07/2002', np.nan, np.nan],
['04/08/2012', '02/11/2018', np.nan]])
df2 = pd.DataFrame([[1, 3, 2], [2, np.nan, np.nan], [3, 1, np.nan]])
df1_sub = df1
df2_sub = df2.fillna(0)
df3 = pd.DataFrame([list(np.repeat(df1_sub.iloc[i,:], df2_sub.iloc[i,:].astype('i4')) )for i in range(df1_sub.shape[0])])
print(df3)
OUTPUT: OUTPUT:
[['01/01/2001' '01/04/2004' '01/04/2004' '01/04/2004' '12/12/2007''12/12/2007']
['02/07/2002' '02/07/2002' nan nan nan nan]
['04/08/2012' '04/08/2012' '04/08/2012' '02/11/2018' nan nan]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.