[英]Remove previous rows from dataframe based on condition
I have two dataframe say df1 (primary dataframe) and df2.我有两个 dataframe 说 df1(主数据帧)和 df2。 I want to drop all previous rows from df1 based on a condition from df2.
我想根据 df2 的条件从 df1 中删除所有先前的行。 My dataframe are like below:
我的 dataframe 如下:
df2 df2
tradingsymbol Time
0 AAAA 12:54:40
1 BBBB 12:53:33
2 CCCC 12:51:50
df1 .head(20) df1.head (20)
tradingsymbol Time last_price
0 AAAA 09:20:10 84.40
1 AAAA 09:20:10 85.95
2 AAAA 12:55:60 84.70 <-Valid Row
3 AAAA 13:22:10 86.35 <-Valid Row
4 AAAA 14:55:40 87.10 <-Valid Row
5 BBBB 09:20:13 88.95
6 BBBB 09:20:13 88.80
7 BBBB 09:20:14 88.30
8 BBBB 14:23:11 87.30 <-Valid Row
9 CCCC 09:20:15 90.15
10 CCCC 09:20:16 90.10
11 CCCC 09:20:17 91.05
12 CCCC 09:20:18 90.95
I want to remove all rows from df1 previous to time in Time column of df2 for each tradingsymbol.我想在df2 的时间列中为每个交易符号删除 df1 之前的所有行。 I want my result as below:
我想要我的结果如下:
tradingsymbol Time last_price
2 AAAA 12:55:60 84.70
3 AAAA 13:22:10 86.35
4 AAAA 14:55:40 87.10
8 BBBB 14:23:11 87.30
You can use pd.concat
and sort values to put remove flags.您可以使用
pd.concat
和排序值来放置删除标志。
import io
import numpy as np
import pandas as pd
# Sample creation
s1 = '''tradingsymbol,Time,last_price
AAAA,09:20:10,84.40
AAAA,09:20:10,85.95
AAAA,12:55:60,84.70
AAAA,13:22:10,86.35
AAAA,14:55:40,87.10
BBBB,09:20:13,88.95
BBBB,09:20:13,88.80
BBBB,09:20:14,88.30
BBBB,14:23:11,87.30
CCCC,09:20:15,90.15
CCCC,09:20:16,90.10
CCCC,09:20:17,91.05
CCCC,09:20:18,90.95'''
s2 = '''tradingsymbol,Time
AAAA,12:54:40
BBBB,12:53:33
CCCC,12:51:50'''
df1 = pd.read_csv(io.StringIO(s1), dtype={'last_pirce': np.float64})
df1.Time = pd.to_datetime(df1.Time, format='%H:%M:%S').dt.time
df2 = pd.read_csv(io.StringIO(s2))
df2.Time = pd.to_datetime(df2.Time, format='%H:%M:%S').dt.time
# Operations to remove specific rows
df = pd.concat([df1, df2], axis=0).sort_values(['tradingsymbol', 'Time'], ascending=[True, False])
df['flag'] = df.last_price.isnull()
df.flag = df.groupby('tradingsymbol').flag.cumsum()
df = df[df.flag==0].sort_values(['tradingsymbol', 'Time']).drop('flag', axis=1)
tradingsymbol![]() |
Time![]() |
last_price![]() |
|
---|---|---|---|
2 ![]() |
AAAA ![]() |
12:56:00 ![]() |
84.7 ![]() |
3 ![]() |
AAAA ![]() |
13:22:10 ![]() |
86.35 ![]() |
4 ![]() |
AAAA ![]() |
14:55:40 ![]() |
87.1 ![]() |
8 ![]() |
BBBB ![]() |
14:23:11 ![]() |
87.3 ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.