繁体   English   中英

如何根据条件从一个 dataframe 到另一个 dataframe 中的 select 行

[英]How to select rows from a dataframe based on conditions with another dataframe

我有两个这样的数据框

df1:
Date       Fruit   
2022-11-24 Banana 
2021-5-23  Orange  
2020-10-1  Apple   
df2:
Start_Date End_Date    Fruit
2021-11-20 2022-12-31  Banana 
2022-6-1   2022-10-31  Banana   
2013-14-1  2022-11-31  Banana 
2013-11-24 2022-10-31  Celery 
2013-11-25 2022-10-31  Apple  
2013-11-25 2022-10-31  Orange  

我想创建一个 dataframe df3 由 df2 中的行组成,条件如下:

  • df1 中有一行具有相同的 Fruit 值,并且 df1 的值 Date 在 df2 的 Start_Date 和 End_Date 之间

所以对于这个例子,df3 将是:

df3:
Start_Date End_Date    Fruit
2021-11-20 2022-12-31  Banana    
2013-14-01 2022-11-31  Banana  
2013-11-25 2022-10-31  Apple  
2013-11-25 2022-10-31  Orange  

我试过这个但它似乎不是更优化的方式并且需要时间来执行大型数据帧

df3 = pd.DataFrame()

for _, row in df1.iterrows():
    FRUIT= row['FRUIT']
    DATE= row['DATE']
    temp= df2[(df2['FRUIT'] == FRUIT) &(df2['Start_Date'] <= DATE) &(df2['End_Date'] >= DATE)] 
    
    df3 = pd.concat([df3 , temp])

谢谢您的帮助!

import pandas as pd

# create a list to store the rows of df3
df3_rows = []

# iterate through the rows of df1
for _, row in df1.iterrows():
  date = row['Date']
  fruit = row['Fruit']
  
  # use boolean indexing to select rows from df2 that meet the conditions
  df2_subset = df2[(df2['Fruit'] == fruit) & (df2['Start_Date'] <= date) & (df2['End_Date'] >= date)]
  
  # add the rows from df2_subset to df3_rows
  df3_rows.extend(df2_subset.to_dict('records'))

# create df3 from the list of rows
df3 = pd.DataFrame(df3_rows)
#For 2013-14-1, there is no month 14, it would get error. 
#And for 2022-11-31, there is no day 31 for Nov. 
#Please correct these two records before running the following code.

df1=pd.read_csv("test1.txt",parse_dates=[0],sep='\t')
df2=pd.read_csv("test2.txt",parse_dates=[0,1],sep='\t') 
#or you can also use pd.to_datetime to convert string into DateTime.
print(df1.dtypes)
print(df2.dtypes)
   
df3=df2.assign(Date=df2.Fruit.map(df1.set_index('Fruit').Date.to_dict())).dropna(subset='Date')
df3=df3.loc[df3.apply(lambda x:x.Date >= x.Start_Date and x.Date <= x.End_Date,axis=1)]
print(df3)

Output:

Date     datetime64[ns]
Fruit    object
dtype: object

Start_Date    datetime64[ns]
End_Date      datetime64[ns]
Fruit         object        
dtype: object


  Start_Date   End_Date   Fruit       Date
0 2021-11-20 2022-12-31  Banana 2022-11-24
2 2013-12-01 2022-11-30  Banana 2022-11-24
4 2013-11-25 2022-10-31  Apple  2020-10-01
5 2013-11-25 2022-10-31  Orange 2021-05-23

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM