简体   繁体   English

在熊猫数据框中将字符串日期转换为其他格式

[英]Convert string date to a different format in pandas dataframe

I have been looking for this answer in the community so far, could not have. 到目前为止,我一直在社区中寻找这个答案,还没有。

I have a dataframe in python 3.5.1 that contains a column with dates in string imported from a CSV file. 我在python 3.5.1中有一个数据框,其中包含一列,其中日期是从CSV文件导入的字符串。

The dataframe looks like this 数据框如下所示

                  TimeStamp  TBD  TBD     Value  TBD
0       2016/06/08 17:19:53  NaN  NaN  0.062942  NaN
1       2016/06/08 17:19:54  NaN  NaN  0.062942  NaN
2       2016/06/08 17:19:54  NaN  NaN  0.062942  NaN

what I need is to change the TimeStamp column format to be %m/%d/%y %H:%M:%D 我需要将TimeStamp列格式更改为%m /%d /%y%H:%M:%D

                  TimeStamp  TBD  TBD     Value  TBD
0       06/08/2016 17:19:53  NaN  NaN  0.062942  NaN

So far I have found some solutions that works but for string and not for series 到目前为止,我已经找到了一些适用于字符串而非序列的解决方案

Any help would be appreciated 任何帮助,将不胜感激

Thanks 谢谢

If you convert the column of strings to a time series, you could use the dt.strftime method : 如果将字符串列转换为时间序列,则可以使用dt.strftime方法

import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')
print(df)

yields 产量

   TBD  TBD.1  TBD.2            TimeStamp     Value
0  NaN    NaN    NaN  06/08/2016 17:19:53  0.062942
1  NaN    NaN    NaN  06/08/2016 17:19:54  0.062942
2  NaN    NaN    NaN  06/08/2016 17:19:54  0.062942

Since you want to convert a column of strings to another (different) column of strings, you could also use the vectorized str.replace method: 由于要将字符串的列转换为另一(不同的)字符串列,因此也可以使用向量化的str.replace方法:

import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'TBD': [nan, nan, nan], 'TBD.1': [nan, nan, nan], 'TBD.2': [nan, nan, nan], 'TimeStamp': ['2016/06/08 17:19:53', '2016/06/08 17:19:54', '2016/06/08 17:19:54'], 'Value': [0.062941999999999998, 0.062941999999999998, 0.062941999999999998]})
df['TimeStamp'] = df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'\2/\3/\1\4')
print(df)

since 以来

In [32]: df['TimeStamp'].str.replace(r'(\d+)/(\d+)/(\d+)(.*)', r'\2/\3/\1\4')
Out[32]: 
0    06/08/2016 17:19:53
1    06/08/2016 17:19:54
2    06/08/2016 17:19:54
Name: TimeStamp, dtype: object

This uses regex to rearrange pieces of the string without first parsing the string as a date . 这将使用正则表达式重新排列字符串的各个部分,而无需首先将字符串解析为日期 This is faster than the first method (mainly because it skips the parsing step), but it also has the disadvantage of not checking that the date strings are valid dates. 这比第一种方法快(主要是因为它跳过了解析步骤),但是它也具有不检查日期字符串是否为有效日期的缺点。

For most common date and datetime formats, pandas .to_datetime function can parse them without we providing format. 对于大多数常见的日期和日期时间格式,pandas .to_datetime函数可以解析它们而无需提供格式。 For example: 例如:

df.TimeStamp.apply(lambda x: pd.to_datetime(x))

And in the example given from the question, 在问题给出的例子中,

df['TimeStamp'] = pd.to_datetime(df['TimeStamp']).dt.strftime('%m/%d/%Y %H:%M:%S')

will give us the same result. 会给我们同样的结果。

Using .apply will be efficient if you have multiple columns. 如果您有多列,则使用.apply将非常有效。

Of course, providing the parsing format is necessary for many situations. 当然,在许多情况下必须提供解析格式。 For a full list of formats, please see https://docs.python.org/3/library/datetime.html . 有关格式的完整列表,请参见https://docs.python.org/3/library/datetime.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM