简体   繁体   English

在 dataframe PANDAS 中查找每个日期的最新日期时间

[英]Find the latest datetime for each date in a dataframe PANDAS

I have a folder on my computer that contains ~8500.csv files that are all names of various stock tickers.我的计算机上有一个文件夹,其中包含 ~8500.csv 个文件,这些文件都是各种股票代码的名称。 Within each.csv file, there is a 'timestamp' and 'users_holding' column.在每个 .csv 文件中,都有一个“timestamp”和“users_holding”列。 I have the 'timestamp' column set up as a datetime index, as the entries in that column include hourly entries for each day ex/ 2019-12-01 01:50, 2020-01-01 02:55... 2020-01-01 01:45 etc. Each one of those timestamps has a corresponding integer representing the number of users holding at that time.我将“时间戳”列设置为日期时间索引,因为该列中的条目包括每天 ex/ 2019-12-01 01:50、2020-01-01 02:55... 2020- 的每小时条目01-01 01:45 等。这些时间戳中的每一个都有对应的 integer 代表当时持有的用户数量。 I want to create a for loop that iterates through all of the.csv files and tallies up the total users holding across all.csv files for the latest time every day starting on February 1st, 2020 (2020-02-01) until the last day in the.csv file.我想创建一个 for 循环,遍历所有 .csv 文件,并计算从 2020 年 2 月 1 日 (2020-02-01) 开始直到最后一天的每天最新时间持有所有 .csv 文件的用户总数.csv 文件中的一天。 The folder updates daily, so I can't really have an end date.该文件夹每天更新,所以我真的无法确定结束日期。

This is the for loop I have set up to establish each ticker as a dataframe:这是我设置的 for 循环,用于将每个代码建立为 dataframe:

path = 'C:\\Users\\N****\\Desktop\\r******\\t**\\p*********\\'
all_files = glob.glob(path + "/*.csv")

for filename in all_files:
    df = pd.read_csv(filename, header = 0, parse_dates = ['timestamp'], index_col='timestamp')

If anyone could show me how to write the for loop that finds the latest entry for each date and tallies up that number for each day, that would be amazing.如果有人能告诉我如何编写 for 循环来查找每个日期的最新条目并计算每天的数字,那就太棒了。

Thank you!谢谢!

First, create a data frame with a Datetime index (in one-hour steps):首先,创建一个带有日期时间索引的数据框(以一小时为单位):

import numpy as np
import pandas as pd

idx = pd.date_range(start='2020-01-01', end='2020-01-31', freq='H')
data = np.arange(len(idx) * 3).reshape(len(idx), 3)
columns = ['ticker-1', 'ticker-2', 'ticker-3']
df = pd.DataFrame(data=data, index=idx, columns=columns)

print(df.head())

                     ticker-1  ticker-2  ticker-3
2020-01-01 00:00:00         0         1         2
2020-01-01 01:00:00         3         4         5
2020-01-01 02:00:00         6         7         8
2020-01-01 03:00:00         9        10        11
2020-01-01 04:00:00        12        13        14

Then, groupby the index (keep year-month-day), but drop hours-minutes-seconds).然后,按索引分组(保持年-月-日,但删除小时-分钟-秒)。 The aggregation function is .last()聚合 function 是.last()

result = (df.groupby(by=df.index.strftime('%Y-%m-%d'))
          [['ticker-1', 'ticker-2', 'ticker-3']]
          .last()
         )

print(result.head())

            ticker-1  ticker-2  ticker-3
2020-01-01        69        70        71
2020-01-02       141       142       143
2020-01-03       213       214       215
2020-01-04       285       286       287
2020-01-05       357       358       359

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获取Pandas DataFrame每个元素的最新信息,以及范围索引和日期列? - Get the latest of each element of a Pandas DataFrame, with range indexing and a date column? 如何在 pandas 和 datetime 上设置最新日期 - how to set latest date on pandas and datetime Pandas:查找 Dataframe 中每个元素的上次编辑日期 - Pandas: Find date of last edit for each element in Dataframe 查找每行具有最新日期的列名 - idxmax 返回 TypeError 和 datetime - Find the column name which has the latest date for each row - idxmax returning TypeError with datetime 按各组最新的 pandas dataframe 和 select 分组 - group by pandas dataframe and select latest in each group 在 pandas 中按日期时间索引的 dataframe 的每个条目的另一个 dataframe 中查找最新条目的有效方法 - efficient way to find the most recent entry in another dataframe for each entry of a dataframe indexed by datetime in pandas 保留 pandas dataframe 的最新日期的数据 - Keep the data with latest date from a pandas dataframe 将包含日期时间范围的Pandas数据框行转换为新数据框,其中每个日期行以及该日期包含的小时 - Convert Pandas dataframe row containing datetime range along to new dataframe with a row for each date along with hours included on that date pandas数据帧中的日期时间不会相互减去 - Datetime in pandas dataframe will not subtract from each other 在熊猫数据框中为每个日期时间添加多行 - Add multiple rows for each datetime in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM