简体   繁体   中英

How to ignore the successive same values of a column considering pandas dataframe?

I have this pandas dataframe:

去向

I want to create a new column "entry_price" that for each day, it considers the first "buy" value in entry, and writes the associated "open" in this column,

This is an example of what dataframe i want to have: (but maybe there's a better way) 期望的 df

So as you can see, i need to consider only the first "buy" of the day,

I tried with no success this method:

df['entry_price'] = df['open'].where(df['entry'] == "buy")

this method does not ignore the successive "buy" values: does not consider the next "buy" of the same day as a "nan". Any ideas?

We first filter the data that ( entry equals to buy ) and ( not duplicate by date and entry ) , which we can get the first buy by each date , then we insert value with column open .

tbl = {"date" :["2022-02-28", "2022-02-28", "2022-03-01", "2022-03-01"],
       "entry" : ["no", "buy", "buy", "buy"],
       "open" : [1.12, 1.13, 1.135, 1.132]}
df = pd.DataFrame(tbl)
df.sort_values(by = "date", inplace=True)
df.loc[(df["entry"] == "buy" ) & (~df.duplicated(["date", "entry"])), "entry_price"] = df.loc[(df["entry"] == "buy" ) & (~df.duplicated(["date", "entry"])), "open"]

df
    date        entry   open    entry_price
0   2022-02-28  no      1.120   NaN
1   2022-02-28  buy     1.130   1.130
2   2022-03-01  buy     1.135   1.135
3   2022-03-01  buy     1.132   NaN

Since the sample data I generated is a simple one, make sure to sort the data by date first before you check the duplicate rows.

You should actually filter your dataframe only where entry == 'buy' , create a new date format only with day and then use groupby method using only the minimum date

data = {"date": ["2022-02-28 06:00:00", "2022-02-28 06:00:05", "2022-03-01 06:59:35", "2022-03-01 06:59:40"],"entry": ["no", "buy", "buy", "buy"], "open": [1.12, 1.13, 1.135, 1.132]}

df = pd.DataFrame(data)
df["day"] = df["date"].apply(lambda elem: elem.split(" ")[0])
# indentify the dates index
dates = df[df['entry'] == 'buy'].groupby("day")["date"].apply(min)
df[df["date"].isin(dates.values)]

                  date entry   open         day
1  2022-02-28 06:00:05   buy  1.130  2022-02-28
2  2022-03-01 06:59:35   buy  1.135  2022-03-01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM