简体   繁体   English

如何在 Python 中相互“相乘”数据帧?

[英]How to "multiply" dataframes with each other in Python?

I have two dataframes in Python/pandas that look as follows:我在 Python/pandas 中有两个数据框,如下所示:

df1 = df1 =
[[01/01/2001, 01/04/2004, 12/12/2007], [[2001 年 1 月 1 日,2004 年 1 月 4 日,2007 年 12 月 12 日],
[02/07/2002, NA, NA], [2002 年 2 月 7 日,北美,北美],
[04/08/2012, 02/11/2018, NA]] [2012 年 4 月 8 日,2018 年 2 月 11 日,北美]]

df2 = df2 =
[[1, 3, 2], [[1, 3, 2],
[2, NA, NA], [2, 不适用, 不适用],
[3, 1, NA]] [3, 1, 不适用]]

I would like to create a third dataframe that looks as follows:我想创建第三个 dataframe,如下所示:

df3 = df3 =
[[01/01/2001, 01/04/2004, 01/04/2004, 01/04/2004, 12/12/2007, 12/12/2007], [[2001 年 1 月 1 日,2004 年 1 月 4 日,2004 年 1 月 4 日,2004 年 1 月 4 日,2007 年 12 月 12 日,2007 年 12 月 12 日],
[02/07/2002, 02/07/2002, NA, NA, NA, NA], [2002 年 2 月 7 日,2002 年 2 月 7 日,北美,北美,北美,北美],
[04/08/2012, 04/08/2012, 04/08/2012, 02/11/2018, NA, NA]] [04/08/2012, 04/08/2012, 04/08/2012, 02/11/2018, NA, NA]]

In other words, the second df gives the number of times that I want to copy the corresponding value of the first df into the third one.换句话说,第二个 df 给出了我想将第一个 df 的相应值复制到第三个的次数。 For lack of a better word, I called this "multiplying" in the question, even though I realize that this is probably wrong.由于没有更好的词,我在问题中称其为“乘法”,尽管我意识到这可能是错误的。

Does someone know of a way to efficiently do this?有人知道有效地做到这一点的方法吗? My approach would be to work with loops and lists for each row, but I'm guessing that there should by a much more efficient way of doing this in Python.我的方法是为每一行使用循环和列表,但我猜想在 Python 中应该有一种更有效的方法来做到这一点。 Many thanks for your help and sorry again for probably using bad terminology here.非常感谢您的帮助,再次抱歉在这里可能使用了错误的术语。

Fully vectorized solution cannot result from this logic, but we can take some benefit of numpy and python Inbuilt operation of list comprehension .完全vectorized solution不能由此逻辑产生,但我们可以从numpypython Inbuilt list comprehension的内置操作中获益。

LOGIC:逻辑:
1. Using np.repeat which Array manipulation routines we will use it to repeat along dataframe df1 row, where argument of repeats of np.repeat function will be the row of df2 object. 1.使用np.repeat Array manipulation routines ,我们将使用它沿dataframe df1行重复,其中np.repeat function 的repeats参数将是df2 ZA666CFDE63191C4BEB6 行

np.repeat(df1.iloc[i,:], df2_u.iloc[i,:].astype('i4'))

2. Important thing to look at is that repeats arguments should be type of int and we will use astype('i4') which is np.int32 datatype to convert df2 row while list comprehension . 2.要注意的重要一点是, repeats arguments 应该是int类型,我们将使用astype('i4')这是np.int32数据类型来转换df2行而list comprehension

df2_u.iloc[i,:].astype('i4')

3. And lastly how to repeat np.nan value form np.nan for that just update df2 as df2_u where NA is filled with 0 using this operation: 3.最后,如何重复np.nan值形式np.nan只需将df2更新为df2_u ,其中NA使用此操作填充为0

df2_u = df2.fillna(0)

Generalized solution, here logic work as if we pass list of lists with unequal-size of nested-list will result into DataFrame-Object with broadcasted row with fill all undefined value np.nan object.通用解决方案,这里的逻辑工作就像我们传递nested-list list of lists unequal-size DataFrame-Object具有广播rowfill所有未定义值np.nan object。

CODE:代码:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([['01/01/2001', '01/04/2004', '12/12/2007'],
                    ['02/07/2002', np.nan, np.nan],
                    ['04/08/2012', '02/11/2018', np.nan]])

df2 = pd.DataFrame([[1, 3, 2], [2, np.nan, np.nan], [3, 1, np.nan]])

df1_sub = df1
df2_sub = df2.fillna(0)

df3 = pd.DataFrame([list(np.repeat(df1_sub.iloc[i,:], df2_sub.iloc[i,:].astype('i4')) )for i in range(df1_sub.shape[0])])
print(df3)

OUTPUT: OUTPUT:

[['01/01/2001' '01/04/2004' '01/04/2004' '01/04/2004' '12/12/2007''12/12/2007']
 ['02/07/2002' '02/07/2002' nan nan nan nan]
 ['04/08/2012' '04/08/2012' '04/08/2012' '02/11/2018' nan nan]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM