[英]Collect cells in pandas df that are listed in another pandas df (with same index)
Consider the following example (the two elements of interest are final_df
and pivot_df
. The rest of the code is just to construct these two df's): 考虑下面的示例(感兴趣的两个元素是
final_df
和pivot_df
,其余代码仅用于构造这两个df):
import numpy
import pandas
numpy.random.seed(0)
input_df = pandas.concat([pandas.Series(numpy.round_(numpy.random.random_sample(10,), 2)),
pandas.Series(numpy.random.randint(0, 2, 10))], axis = 1)
input_df.columns = ['key', 'val']
pivot_df = input_df.pivot(columns = 'key', values = 'val')\
.fillna(method = 'pad')\
.cumsum()
index_df = pivot_df.notnull()\
.multiply(pivot_df.columns, axis = 1)\
.replace({0.0: numpy.nan})\
.values
final_df = numpy.delete(numpy.partition(index_df, 3, axis = 1),
numpy.s_[3:index_df.shape[1]], axis = 1)
final_df.sort(axis = 1)
final_df = pandas.DataFrame(final_df)
final_df
contains as many rows as pivot_df
. final_df
包含尽可能多的行作为pivot_df
。 I want to use these two to construct a third df: bingo_df
. 我想用这两个来构造第三个df:
bingo_df
。
bingo_df
should have the same dimensions as final_df
. bingo_df
应该具有与final_df
相同的尺寸。 Then, the cells of bingo_df
should contain: 然后,
bingo_df
的单元bingo_df
应包含:
(row = i, col = j)
of final_df
is numpy.nan
, the entry (i,j)
of bingo_df
should be numpy.nan
as well. final_df
的条目(row = i, col = j)
为final_df
, numpy.nan
的条目(i,j)
bingo_df
应为numpy.nan
。 (i, j)
of final_df
is not numpy.nan
] the entry (i,j)
of bingo_df
should be the value at cell [i, final_df[i, j].value]
of pivot_df
(in fact final_df[i, j].value
is either the name of a column of pivot_df
or numpy.nan
) (i, j)
的final_df
不是numpy.nan
]的条目(i,j)
的bingo_df
应该在单元中的值[i, final_df[i, j].value]
的pivot_df
(在事实final_df[i, j].value
是pivot_df
或numpy.nan
的列的名称) so the first row of final_df
is 所以
final_df
的第一行是
0.55, nan, nan
. 0.55, nan, nan
。
So I'm expecting the first row of bingo_df
to be: 所以我期望
bingo_df
的第一行是:
0.0, nan, nan
because the value in cell (row = 0, col = 0.55)
of pivot_df
is 0
(and the two subsequent numpy.nan
in the first row of final_df
should also be numpy.nan
in bingo_df
) 因为在单元中的值
(row = 0, col = 0.55)
的pivot_df
是0
(和随后的两个numpy.nan
的第一行中final_df
还应numpy.nan
在bingo_df
)
so the second row of final_df
is 所以
final_df
的第二行是
0.55, 0.72, nan
So I'm expecting the second row of bingo_df
to be: 所以我期望
bingo_df
的第二行是:
0.0, 1.0, nan
because the value in cell (row = 1, col = 0.55)
of pivot_df
is 0.0
and the value in cell (row = 1, col = 0.72)
of pivot_df
is 1.0
因为
pivot_df
单元格(row = 1, col = 0.55)
的pivot_df
0.0
,而pivot_df
单元格中(row = 1, col = 0.72)
的pivot_df
1.0
IIUC lookup
IIUC
lookup
s=final_df.stack()
pd.Series(pivot_df.lookup(s.index.get_level_values(0),s),index=s.index).unstack()
Out[87]:
0 1 2
0 0.0 NaN NaN
1 0.0 1.0 NaN
2 0.0 1.0 2.0
3 0.0 0.0 2.0
4 0.0 0.0 0.0
5 0.0 0.0 0.0
6 0.0 1.0 0.0
7 0.0 2.0 0.0
8 0.0 3.0 0.0
9 0.0 0.0 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.