Pandas dataframe - 如何为现有日期列综合添加唯一时间戳，其中仅包含日期但不包含时间？

Question

I have a simple dataframe with a string index.我有一个带有字符串索引的简单 dataframe 。
The string represents time (eg 2018-01-01), and contains duplications.该字符串表示时间（例如 2018-01-01），并且包含重复项。
Applying pd.to_datetime() takes me in the right direction, and well converts the index from a string type into datetime type.应用 pd.to_datetime() 将我带入正确的方向，并将索引从字符串类型转换为日期时间类型。
However it does not solves the duplications problem.但是它并没有解决重复问题。
I would ideally wish to synthetically add some unique timeStamp (%h:%m:%s) to each index cell.理想情况下，我希望向每个索引单元格综合添加一些唯一的时间戳 (%h:%m:%s)。
Can you please guide me how to achieve that?你能指导我如何实现吗？

Here is a simple example of what I'm trying to achieve:这是我想要实现的一个简单示例：

import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01'], 
                  columns = ['A', 'B', 'C'] ).fillna(0)

That yields the following dataframe:这会产生以下 dataframe：

            A  B  C
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-01  0  0  0

I would like to convert it for something like that (unique datetime index):我想将它转换为类似的东西（唯一的日期时间索引）：

                     A  B  C
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-01-01 00:00:03  0  0  0

Thanks ahead,提前谢谢，
Shahar沙哈尔

Answer 1

If all values of datetimes are unique use to_datetime with unit and origin parameter by first value if index and add to index by DataFrame.set_index :如果 datetimes 的所有值都是唯一的，则使用to_datetime与unit和origin参数，如果 index 和添加到索引DataFrame.set_index ：

df = df.set_index(pd.to_datetime(np.arange(len(df)), 
                                 unit='s', 
                                 origin=df.index[0]))
print (df)
                     A  B  C
2018-01-01 00:00:00  0  0  0
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0

If there are multiple unique datetime s in index add timedeltas created by GroupBy.cumcount to Datetimeindex :如果索引中有多个唯一的datetime ，则将 GroupBy.cumcount 创建的GroupBy.cumcount添加到Datetimeindex ：

import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01',
                           '2018-02-01', '2018-02-01'], 
                  columns = ['A', 'B', 'C'] ).fillna(0)


df = df.set_index(pd.to_datetime(df.index) + 
                  pd.to_timedelta(df.groupby(level=0).cumcount(), unit='s'))
print (df)
                     A  B  C
2018-01-01 00:00:00  0  0  0
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-02-01 00:00:00  0  0  0
2018-02-01 00:00:01  0  0  0

Answer 2

You can use pd.to_datetime in combination with pd.to_timedelta to get the desired results.您可以将pd.to_datetime与pd.to_timedelta结合使用以获得所需的结果。

Use:利用：

df.index = (
    pd.to_datetime(df.index) + 
    pd.to_timedelta(range(1, len(df) + 1), unit='s'))

print(df)

This prints the resulting dataframe as:这会将生成的 dataframe 打印为：

                     A  B  C
2018-01-01 00:00:01  0  0  0
2018-01-01 00:00:02  0  0  0
2018-01-01 00:00:03  0  0  0

Answer 3

To express your task more generally (for multiple dates):更一般地表达您的任务（针对多个日期）：

you have a DataFrame with a string index, formatted like dates,你有一个带有字符串索引的 DataFrame，格式类似于日期，
you want to convert the index to datetime ,您想将索引转换为datetime ，
but within each date set the time part to consecitive seconds.但在每个日期内将时间部分设置为连续秒。

To do it you can run:为此，您可以运行：

df.index = pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)).groupby(level=0)\
    .transform(lambda grp: grp.cumsum() + grp.index)

Steps:脚步：

pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)) - create a Series filled with one second values and the index from df converted to datetime , for now still with no time part. pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)) - 创建一个充满一秒值的系列，并将索引从df转换为datetime ，现在仍然没有时间部分。
groupby(...) - group it by dates. groupby(...) - 按日期分组。
transform(...) - transform it with the lambda function given. transform(...) - 使用给定的 lambda function 对其进行转换。
grp.cumsum() - the time part alone - consecutive seconds. grp.cumsum() - 单独的时间部分 - 连续秒。
+ grp.index - add the date part. + grp.index - 添加日期部分。
df.index - set the index in df to this result. df.index - 将df中的索引设置为此结果。

The result, for 2 dates, even when dates are "intermixed", is still OK:结果，对于 2 个日期，即使日期“混合”，仍然可以：

            A  B  C
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-01  0  0  0
2018-01-02  0  0  0
2018-01-02  0  0  0
2018-01-02  0  0  0
2018-01-01  0  0  0

If you have a DataFrame with a single date, you can still use this code (you will have a single group only).如果您有一个带有单个日期的 DataFrame，您仍然可以使用此代码（您将只有一个组）。

Pandas dataframe - 如何为现有日期列综合添加唯一时间戳，其中仅包含日期但不包含时间？

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-05-15 10:20:24

解决方案2
1 2020-05-15 10:18:12

解决方案3
1 2020-05-15 10:44:42

Pandas dataframe - 如何为现有日期列综合添加唯一时间戳，其中仅包含日期但不包含时间？

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-05-15 10:20:24

解决方案2 1 2020-05-15 10:18:12

解决方案3 1 2020-05-15 10:44:42

解决方案1
2 已采纳 2020-05-15 10:20:24

解决方案2
1 2020-05-15 10:18:12

解决方案3
1 2020-05-15 10:44:42