[英]How to iterate over pandas dataframe and check for the day in datetimeindex
I have a large dataframe with this datetimeindex:我有一个带有这个日期时间索引的大型 dataframe:
... Date A B
190 2019-09-13 21:50:00 1 2
191 2019-09-13 21:55:00 3 2
192 2019-09-13 22:00:00 1 2
193 2019-09-13 22:05:00 3 2
194 2019-09-13 22:10:00 1 2
195 2019-09-16 06:00:00 1 2
196 2019-09-16 06:05:00 1 2
197 2019-09-16 06:10:00 4 2
198 2019-09-16 06:15:00 1 2
199 2019-09-16 06:20:00 4 2
200 2019-09-16 06:25:00 1 2
.....
Name: Date, dtype: datetime64[ns]
Now I need to count if A is larger or equal to B, but only the first time on each day.现在我需要计算 A 是否大于或等于 B,但只需要每天第一次。 How can I achieve it that the list is getting filled only with the first hit per day?
我怎样才能实现这个列表只被每天的第一次点击填充?
count = []
for i in df.index:
if A[i] >= B[i]:
count.append('A is larger than B' + f" on {df.Date[i]}")
My desired output according to this example would be根据此示例,我想要的 output 将是
A is larger than B on 2019-09-13 21:55:00
A is larger than B on 2019-09-16 06:10:00
You can first filter rows by Series.ge
(greater or equal, >=
) with boolean indexing
and then get first values by Series.dt.date
and GroupBy.first
:您可以首先使用
boolean indexing
按Series.ge
(大于或等于, >=
)过滤行,然后通过Series.dt.date
和GroupBy.first
获取第一个值:
df['Date'] = pd.to_datetime(df['Date'])
m = df['A'].ge(df['B'])
df1 = df[m].groupby(df['Date'].dt.date).first()
print (df1)
Date A B
Date
2019-09-13 2019-09-13 21:55:00 3 2
2019-09-16 2019-09-16 06:10:00 4 2
Or create helper column by dates and then use DataFrame.drop_duplicates
:或者按日期创建辅助列,然后使用
DataFrame.drop_duplicates
:
df['Date'] = pd.to_datetime(df['Date'])
df['d'] = df['Date'].dt.date
m = df['A'].ge(df['B'])
df1 = df[m].drop_duplicates('d')
print (df1)
Date A B d
191 2019-09-13 21:55:00 3 2 2019-09-13
197 2019-09-16 06:10:00 4 2 2019-09-16
for d in df1.Date:
print ('A is larger than B' + f" on {d}")
A is larger than B on 2019-09-13 21:55:00
A is larger than B on 2019-09-16 06:10:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.