简体   繁体   English

Pandas dataframe - 如何为现有日期列综合添加唯一时间戳,其中仅包含日期但不包含时间?

[英]Pandas dataframe - How to synthetically add a unique timestamp for existing date column, which contains only date but no time?

I have a simple dataframe with a string index.我有一个带有字符串索引的简单 dataframe 。
The string represents time (eg 2018-01-01), and contains duplications.该字符串表示时间(例如 2018-01-01),并且包含重复项。
Applying pd.to_datetime() takes me in the right direction, and well converts the index from a string type into datetime type.应用 pd.to_datetime() 将我带入正确的方向,并将索引从字符串类型转换为日期时间类型。
However it does not solves the duplications problem.但是它并没有解决重复问题。
I would ideally wish to synthetically add some unique timeStamp (%h:%m:%s) to each index cell.理想情况下,我希望向每个索引单元格综合添加一些唯一的时间戳 (%h:%m:%s)。
Can you please guide me how to achieve that?你能指导我如何实现吗?

Here is a simple example of what I'm trying to achieve:这是我想要实现的一个简单示例:

import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01'], 
                  columns = ['A', 'B', 'C'] ).fillna(0)

That yields the following dataframe:这会产生以下 dataframe:

            A  B  C
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-01  0  0  0

I would like to convert it for something like that (unique datetime index):我想将它转换为类似的东西(唯一的日期时间索引):

                     A  B  C
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-01-01 00:00:03  0  0  0

Thanks ahead,提前谢谢,
Shahar沙哈尔

If all values of datetimes are unique use to_datetime with unit and origin parameter by first value if index and add to index by DataFrame.set_index :如果 datetimes 的所有值都是唯一的,则使用to_datetimeunitorigin参数,如果 index 和添加到索引DataFrame.set_index

df = df.set_index(pd.to_datetime(np.arange(len(df)), 
                                 unit='s', 
                                 origin=df.index[0]))
print (df)
                     A  B  C
2018-01-01 00:00:00  0  0  0
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0

If there are multiple unique datetime s in index add timedeltas created by GroupBy.cumcount to Datetimeindex :如果索引中有多个唯一的datetime ,则将 GroupBy.cumcount 创建的GroupBy.cumcount添加到Datetimeindex

import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01',
                           '2018-02-01', '2018-02-01'], 
                  columns = ['A', 'B', 'C'] ).fillna(0)


df = df.set_index(pd.to_datetime(df.index) + 
                  pd.to_timedelta(df.groupby(level=0).cumcount(), unit='s'))
print (df)
                     A  B  C
2018-01-01 00:00:00  0  0  0
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-02-01 00:00:00  0  0  0
2018-02-01 00:00:01  0  0  0

You can use pd.to_datetime in combination with pd.to_timedelta to get the desired results.您可以将pd.to_datetimepd.to_timedelta结合使用以获得所需的结果。

Use:利用:

df.index = (
    pd.to_datetime(df.index) + 
    pd.to_timedelta(range(1, len(df) + 1), unit='s'))

print(df)

This prints the resulting dataframe as:这会将生成的 dataframe 打印为:

                     A  B  C
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-01-01 00:00:03  0  0  0

To express your task more generally (for multiple dates):更一般地表达您的任务(针对多个日期):

  • you have a DataFrame with a string index, formatted like dates,你有一个带有字符串索引的 DataFrame,格式类似于日期,
  • you want to convert the index to datetime ,您想将索引转换为datetime
  • but within each date set the time part to consecitive seconds.但在每个日期内将时间部分设置为连续秒。

To do it you can run:为此,您可以运行:

df.index = pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)).groupby(level=0)\
    .transform(lambda grp: grp.cumsum() + grp.index)

Steps:脚步:

  • pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)) - create a Series filled with one second values and the index from df converted to datetime , for now still with no time part. pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)) - 创建一个充满一秒值的系列,并将索引从df转换为datetime ,现在仍然没有时间部分。
  • groupby(...) - group it by dates. groupby(...) - 按日期分组。
  • transform(...) - transform it with the lambda function given. transform(...) - 使用给定的 lambda function 对其进行转换。
  • grp.cumsum() - the time part alone - consecutive seconds. grp.cumsum() - 单独的时间部分 - 连续秒。
  • + grp.index - add the date part. + grp.index - 添加日期部分。
  • df.index - set the index in df to this result. df.index - 将df中的索引设置为此结果。

The result, for 2 dates, even when dates are "intermixed", is still OK:结果,对于 2 个日期,即使日期“混合”,仍然可以:

            A  B  C
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-02  0  0  0
2018-01-02  0  0  0
2018-01-02  0  0  0
2018-01-01  0  0  0

If you have a DataFrame with a single date, you can still use this code (you will have a single group only).如果您有一个带有单个日期的 DataFrame,您仍然可以使用此代码(您将只有一个组)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何添加时间戳作为现有 Pandas DataFrame 的最后一列 - How to add a timestamp as the last column of an existing Pandas DataFrame 仅从不均匀的熊猫列中的时间戳中删除日期并保留时间 - Removing date and keeping time only from timestamp in a uneven pandas column 将日期时间戳作为列附加到 Python 中的现有 DataFrame - Append Date Timestamp as a column to existing DataFrame in Python 如何将熊猫数据框日期和不同的时间格式连接到单个时间戳? - How to concatenate pandas dataframe date and different time formats to single timestamp? 如何在非唯一列中按日期将pandas DataFrame条目分组 - How to group pandas DataFrame entries by date in a non-unique column 如何仅从具有日期时间值的数据框列中获取日期 - how to get only date from dataframe column with date time value 如何手动将时间值添加到 Pandas dataframe 时间戳列? - How to add time values manually to Pandas dataframe TimeStamp column? Python Pandas Dataframe:基于现有列添加新列,其中包含列表列表 - Python Pandas Dataframe: add new column based on existing column, which contains lists of lists 在 Python Pandas DataFrame 中创建一个新列,其中包含出现行项目的最小日期 - Creating a new column in a Python Pandas DataFrame, which contains the min date, for which the rows item appeared 如何仅为现有数据框的一列添加新日期和值 - How to add a new date and value for only one of the columns of an existing dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM