if.isin() dataframe 1、check condition in dataframe 2、append new dataframe with checked conditions

Question

我有两个数据框

df1 有一个 ID 和日期列表

ID	e1	e2	e3
1个	2012-09-12	2001-03-06	1999-09-03
2个	2009-09-07	2002-04-06	2003-01-02
3个	2005-08-09	2005-06-04	2008-01-02

df2 具有相同的 id 和其他值

ID	e1	e2	e3
1个	A120	B130	C122
2个	BD43	A200	A111
3个	C890	B123	A190

我想遍历 df2，在每一列中查找以“A”开头的值（例如（A120、A200..etc），一旦找到该值，我将 go 到同一个 rowxcolumn 中的 df1 并查看是否日期 >= 2005-01-01，并将检查这两个条件的所有 id 添加到新的 dataframe。

所以理想的结果应该是这样的：

ID	e1	e2	e3
1个	A120	B130	C112
3个	C890	B123	A190

我可以管理的唯一方法是循环遍历两个矩阵，但它非常慢，因为数据帧非常大。 有没有不同的方法来解决这个问题

Answer 1

您可以使用 boolean 索引：

ref = '2005-01-01'

# is the date < ref?
m1 = df1.set_index('id').le(ref)
# is the string starting with A?
m2 = df2.set_index('id').apply(lambda s: s.str.startswith('A'))

# if both conditions are matched anywhere in the row, drop it
out = df1[~(m1&m2).any(axis=1).to_numpy()]

注意。 如果id是索引，请不要执行set_index('id')步骤。 Output：

   id          e1          e2          e3
0   1  2012-09-12  2001-03-06  1999-09-03
2   3  2005-08-09  2005-06-04  2008-01-02

if.isin() dataframe 1、check condition in dataframe 2、append new dataframe with checked conditions

问题描述

1 个解决方案

解决方案1
0 2022-11-25 10:45:52

if.isin() dataframe 1、check condition in dataframe 2、append new dataframe with checked conditions

问题描述

1 个解决方案

解决方案1 0 2022-11-25 10:45:52

解决方案1
0 2022-11-25 10:45:52