[英]Putting NaN when a day in a DataFrame doesn't return a value
I want to get the last in the "15:30:00.0" row for every day in the data frame, but as you can see on the 16th we don't get any rows for anything from 13:00:00.0 to 15:30:00.0
我想获取数据框中每一天的“15:30:00.0”行中的最后一个,但正如您在 16 号看到的那样,我们没有获取从 13:00:00.0 到 15 的任何行: 30:00.0
My DataFrame:
我的DataFrame:
Date Time Open High Low Last
0 2023-01-13 09:30:00.0 3968.25 3985 3965.75 3980.25
1 2023-01-13 10:00:00.0 3980 3998.5 3974 3998
2 2023-01-13 10:30:00.0 3998 4000.75 3991.25 3996.75
3 2023-01-13 11:00:00.0 3996.5 3999 3986.25 3992.75
4 2023-01-13 11:30:00.0 3993 3993.5 3985 3990.75
5 2023-01-13 12:00:00.0 3990.75 3998.75 3989.75 3997.5
6 2023-01-13 12:30:00.0 3997.5 4002 3993 3999.75
7 2023-01-13 13:00:00.0 4000 4002.25 3993.75 3997.5
8 2023-01-13 13:30:00.0 3997.25 4010 3996.25 4008.25
9 2023-01-13 14:00:00.0 4008 4010.75 4004.25 4008.75
10 2023-01-13 14:30:00.0 4009 4011.75 4006.25 4009.5
11 2023-01-13 15:00:00.0 4009.75 4016 4009 4016
12 2023-01-13 15:30:00.0 4016 4024.25 4014.75 4017.75
13 2023-01-16 09:30:00.0 4014.75 4019.25 4014.75 4017.5
14 2023-01-16 10:00:00.0 4017.75 4020 4015.5 4017.25
15 2023-01-16 10:30:00.0 4017 4020.5 4017 4018.25
16 2023-01-16 11:00:00.0 4018 4019.5 4015.75 4016.75
17 2023-01-16 11:30:00.0 4016.75 4017 4010.5 4012
18 2023-01-16 12:00:00.0 4012.25 4013 4010.75 4010.75
19 2023-01-16 12:30:00.0 4010.75 4015 4008 4010
20 2023-01-17 09:30:00.0 4018 4024.25 4008.75 4018.25
21 2023-01-17 10:00:00.0 4018.5 4035.25 4018.5 4030.25
22 2023-01-17 10:30:00.0 4030.25 4031.25 4010.5 4014.75
23 2023-01-17 11:00:00.0 4014.75 4017.25 4002.75 4009.5
24 2023-01-17 11:30:00.0 4009.25 4016.25 4008.25 4014.5
25 2023-01-17 12:00:00.0 4014.75 4019 4007.25 4008.25
26 2023-01-17 12:30:00.0 4008.5 4016 4007.75 4013.5
27 2023-01-17 13:00:00.0 4013.75 4016.5 4011.5 4014
28 2023-01-17 13:30:00.0 4014.25 4020.5 4012.75 4019
29 2023-01-17 14:00:00.0 4019.25 4021 4008.25 4010.75
30 2023-01-17 14:30:00.0 4011 4019.5 4010.75 4013.75
31 2023-01-17 15:00:00.0 4013.75 4018.25 4010.25 4012
32 2023-01-17 15:30:00.0 4011.75 4014.25 4003.75 4010
if I use this code below to try to pull the interval:
如果我使用下面的代码来尝试拉间隔:
m = df["Time"].eq("15:30:00.0")
out = df[m].groupby(["Date", "Time"], as_index=False)["Last"].max()
Output:
Output:
Date Time Last
0 2023-01-13 15:30:00.0 4017.75
1 2023-01-17 15:30:00.0 4010
Is it possible to put Nan or put something there so it recognizes the day but since no value just put Nan.
是否可以将 Nan 或其他东西放在那里,以便它识别这一天,但因为没有价值只是把 Nan 放在那里。
My desired output:
我想要的 output:
Date Time Last
0 2023-01-13 15:30:00.0 4017.75
1 2023-01-16 15:30:00.0 NaN
2 2023-01-17 15:30:00.0 4010
You can use .combine_first() with a base DF that extracts all the dates in the original DF using .unique() :
您可以将.combine_first()与基本 DF 一起使用,该基本 DF 使用.unique() () 提取原始 DF 中的所有日期:
base = pd.DataFrame({"Date": df["Date"].unique(), "Time": "15:30:00.0"})
base.set_index("Date").combine_first(out.set_index("Date")).reset_index()
This outputs:
这输出:
Date Last Time
0 2023-01-13 4017.75 15:30:00.0
1 2023-01-16 NaN 15:30:00.0
2 2023-01-17 4010.00 15:30:00.0
you can use dropna=False in your groupby clause
您可以在 groupby 子句中使用 dropna=False
df[m].groupby(["Date", "Time"], as_index=False, dropna=False)
but this would also put nan for the 14 and 15th which may not be what you want... but the other answer provides a mechanism for that I think
但这也会把 nan 放在 14 和 15 上,这可能不是你想要的......但我认为另一个答案提供了一种机制
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.