[英]Pandas dataframe - How to synthetically add a unique timestamp for existing date column, which contains only date but no time?
I have a simple dataframe with a string index.我有一个带有字符串索引的简单 dataframe 。
The string represents time (eg 2018-01-01), and contains duplications.该字符串表示时间(例如 2018-01-01),并且包含重复项。
Applying pd.to_datetime() takes me in the right direction, and well converts the index from a string type into datetime type.应用 pd.to_datetime() 将我带入正确的方向,并将索引从字符串类型转换为日期时间类型。
However it does not solves the duplications problem.但是它并没有解决重复问题。
I would ideally wish to synthetically add some unique timeStamp (%h:%m:%s) to each index cell.理想情况下,我希望向每个索引单元格综合添加一些唯一的时间戳 (%h:%m:%s)。
Can you please guide me how to achieve that?你能指导我如何实现吗?
Here is a simple example of what I'm trying to achieve:这是我想要实现的一个简单示例:
import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01'],
columns = ['A', 'B', 'C'] ).fillna(0)
That yields the following dataframe:这会产生以下 dataframe:
A B C
2018-01-01 0 0 0
2018-01-01 0 0 0
2018-01-01 0 0 0
I would like to convert it for something like that (unique datetime index):我想将它转换为类似的东西(唯一的日期时间索引):
A B C
2018-01-01 00:00:01 0 0 0
2018-01-01 00:00:02 0 0 0
2018-01-01 00:00:03 0 0 0
Thanks ahead,提前谢谢,
Shahar沙哈尔
If all values of datetimes are unique use to_datetime
with unit
and origin
parameter by first value if index and add to index by DataFrame.set_index
:如果 datetimes 的所有值都是唯一的,则使用
to_datetime
与unit
和origin
参数,如果 index 和添加到索引DataFrame.set_index
:
df = df.set_index(pd.to_datetime(np.arange(len(df)),
unit='s',
origin=df.index[0]))
print (df)
A B C
2018-01-01 00:00:00 0 0 0
2018-01-01 00:00:01 0 0 0
2018-01-01 00:00:02 0 0 0
If there are multiple unique datetime
s in index add timedeltas created by GroupBy.cumcount
to Datetimeindex
:如果索引中有多个唯一的
datetime
,则将 GroupBy.cumcount 创建的GroupBy.cumcount
添加到Datetimeindex
:
import pandas as pd
df = pd.DataFrame(index = ['2018-01-01', '2018-01-01', '2018-01-01',
'2018-02-01', '2018-02-01'],
columns = ['A', 'B', 'C'] ).fillna(0)
df = df.set_index(pd.to_datetime(df.index) +
pd.to_timedelta(df.groupby(level=0).cumcount(), unit='s'))
print (df)
A B C
2018-01-01 00:00:00 0 0 0
2018-01-01 00:00:01 0 0 0
2018-01-01 00:00:02 0 0 0
2018-02-01 00:00:00 0 0 0
2018-02-01 00:00:01 0 0 0
You can use pd.to_datetime
in combination with pd.to_timedelta
to get the desired results.您可以将
pd.to_datetime
与pd.to_timedelta
结合使用以获得所需的结果。
Use:利用:
df.index = (
pd.to_datetime(df.index) +
pd.to_timedelta(range(1, len(df) + 1), unit='s'))
print(df)
This prints the resulting dataframe as:这会将生成的 dataframe 打印为:
A B C
2018-01-01 00:00:01 0 0 0
2018-01-01 00:00:02 0 0 0
2018-01-01 00:00:03 0 0 0
To express your task more generally (for multiple dates):更一般地表达您的任务(针对多个日期):
To do it you can run:为此,您可以运行:
df.index = pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index)).groupby(level=0)\
.transform(lambda grp: grp.cumsum() + grp.index)
Steps:脚步:
pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index))
- create a Series filled with one second values and the index from df converted to datetime , for now still with no time part. pd.Series(pd.Timedelta('1S'), index=pd.to_datetime(df.index))
- 创建一个充满一秒值的系列,并将索引从df转换为datetime ,现在仍然没有时间部分。groupby(...)
- group it by dates. groupby(...)
- 按日期分组。transform(...)
- transform it with the lambda function given. transform(...)
- 使用给定的 lambda function 对其进行转换。grp.cumsum()
- the time part alone - consecutive seconds. grp.cumsum()
- 单独的时间部分 - 连续秒。+ grp.index
- add the date part. + grp.index
- 添加日期部分。df.index
- set the index in df to this result. df.index
- 将df中的索引设置为此结果。 The result, for 2 dates, even when dates are "intermixed", is still OK:结果,对于 2 个日期,即使日期“混合”,仍然可以:
A B C
2018-01-01 0 0 0
2018-01-01 0 0 0
2018-01-01 0 0 0
2018-01-02 0 0 0
2018-01-02 0 0 0
2018-01-02 0 0 0
2018-01-01 0 0 0
If you have a DataFrame with a single date, you can still use this code (you will have a single group only).如果您有一个带有单个日期的 DataFrame,您仍然可以使用此代码(您将只有一个组)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.