简体   繁体   English

根据条件从 dataframe 中删除前面的行

[英]Remove previous rows from dataframe based on condition

I have two dataframe say df1 (primary dataframe) and df2.我有两个 dataframe 说 df1(主数据帧)和 df2。 I want to drop all previous rows from df1 based on a condition from df2.我想根据 df2 的条件从 df1 中删除所有先前的行。 My dataframe are like below:我的 dataframe 如下:

df2 df2

           tradingsymbol      Time
0                   AAAA  12:54:40
1                   BBBB  12:53:33
2                   CCCC  12:51:50

df1 .head(20) df1.head (20)

            tradingsymbol      Time  last_price
0                    AAAA  09:20:10       84.40
1                    AAAA  09:20:10       85.95
2                    AAAA  12:55:60       84.70 <-Valid Row
3                    AAAA  13:22:10       86.35 <-Valid Row
4                    AAAA  14:55:40       87.10 <-Valid Row

5                    BBBB  09:20:13       88.95
6                    BBBB  09:20:13       88.80
7                    BBBB  09:20:14       88.30
8                    BBBB  14:23:11       87.30 <-Valid Row

9                    CCCC  09:20:15       90.15
10                   CCCC  09:20:16       90.10
11                   CCCC  09:20:17       91.05
12                   CCCC  09:20:18       90.95

I want to remove all rows from df1 previous to time in Time column of df2 for each tradingsymbol.我想在df2 的时间列中为每个交易符号删除 df1 之前的所有行。 I want my result as below:我想要我的结果如下:

            tradingsymbol      Time  last_price
2                    AAAA  12:55:60       84.70
3                    AAAA  13:22:10       86.35
4                    AAAA  14:55:40       87.10
8                    BBBB  14:23:11       87.30

You can use pd.concat and sort values to put remove flags.您可以使用pd.concat和排序值来放置删除标志。

Code代码

import io
import numpy as np
import pandas as pd

# Sample creation
s1 = '''tradingsymbol,Time,last_price
AAAA,09:20:10,84.40
AAAA,09:20:10,85.95
AAAA,12:55:60,84.70
AAAA,13:22:10,86.35
AAAA,14:55:40,87.10
BBBB,09:20:13,88.95
BBBB,09:20:13,88.80
BBBB,09:20:14,88.30
BBBB,14:23:11,87.30
CCCC,09:20:15,90.15
CCCC,09:20:16,90.10
CCCC,09:20:17,91.05
CCCC,09:20:18,90.95'''

s2 = '''tradingsymbol,Time
AAAA,12:54:40
BBBB,12:53:33
CCCC,12:51:50'''

df1 = pd.read_csv(io.StringIO(s1), dtype={'last_pirce': np.float64})
df1.Time = pd.to_datetime(df1.Time, format='%H:%M:%S').dt.time

df2 = pd.read_csv(io.StringIO(s2))
df2.Time = pd.to_datetime(df2.Time, format='%H:%M:%S').dt.time

# Operations to remove specific rows
df = pd.concat([df1, df2], axis=0).sort_values(['tradingsymbol', 'Time'], ascending=[True, False])
df['flag'] = df.last_price.isnull()
df.flag = df.groupby('tradingsymbol').flag.cumsum()
df = df[df.flag==0].sort_values(['tradingsymbol', 'Time']).drop('flag', axis=1)

Output Output

tradingsymbol交易符号 Time时间 last_price最后价格
2 2个 AAAA AAAA级 12:56:00 12:56:00 84.7 84.7
3 3个 AAAA AAAA级 13:22:10 13:22:10 86.35 86.35
4 4个 AAAA AAAA级 14:55:40 14:55:40 87.1 87.1
8 8个 BBBB BBBB 14:23:11 14:23:11 87.3 87.3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM