处理熊猫数据框中的多种日期格式

Question

I have a dataframe (imported from Excel) which looks like this: 我有一个数据框（从Excel导入），看起来像这样：

         Date               Period  
0  2017-03-02  2017-03-01 00:00:00  
1  2017-03-02  2017-04-01 00:00:00     
2  2017-03-02  2017-05-01 00:00:00    
3  2017-03-02  2017-06-01 00:00:00    
4  2017-03-02  2017-07-01 00:00:00      
5  2017-03-02  2017-08-01 00:00:00   
6  2017-03-02  2017-09-01 00:00:00    
7  2017-03-02  2017-10-01 00:00:00  
8  2017-03-02  2017-11-01 00:00:00 
9  2017-03-02  2017-12-01 00:00:00 
10 2017-03-02                 Q217 
11 2017-03-02                 Q317  
12 2017-03-02                 Q417 
13 2017-03-02                 Q118 
14 2017-03-02                 Q218 
15 2017-03-02                 Q318 
16 2017-03-02                 Q418 
17 2017-03-02                 2018

I am trying to convert all the 'Period' column into a consistent format. 我正在尝试将所有“期间”列转换为一致的格式。 Some elements look already in the datetime format, others are converted to string (ex. Q217), others to int (ex 2018). 有些元素已经以日期时间格式显示，另一些元素转换为字符串（例如Q217），另一些元素转换为int（例如2018年）。 Which is the fastest way to convert everything in a datetime? 在日期时间转换所有内容的最快方法是什么？ I was trying with some masking, like this: 我正在尝试进行一些遮罩，如下所示：

mask = df['Period'].str.startswith('Q', na = False)
list_quarter = df_final[mask]['Period'].tolist()
quarter_convert = {'1':'31/03', '2':'30/06', '3':'31/08', '4':'30/12'}
counter = 0
for element in list_quarter:
    element = element[1:]
    quarter = element[0]
    year = element[1:]
    daymonth = ''.join(str(quarter_convert.get(word, word)) for word in quarter)
    final = daymonth+'/'+year
    list_quarter[counter] = final
    counter+=1

However it fails when I try to substitute the modified elements in the original column: 但是，当我尝试替换原始列中的已修改元素时，它会失败：

df_nwe_final['Period'] = np.where(mask, pd.Series(list_quarter), df_nwe_final['Period'])

Of course I would need to do more or less the same with the 2018 type formats. 当然，我将需要对2018年的字体类型做更多或更少的事情。 However, I am sure I am missing something here, and there should be a much faster solution. 但是，我确定我在这里遗漏了一些东西，应该有一个更快的解决方案。 Some fresh ideas from you would help! 您的一些新想法会有所帮助！ Thank you. 谢谢。

Answer 1

Reusing the code you show, let's first write a function that converts the Q -string to a datetime format (I adjusted to final format a little bit): 重用显示的代码，让我们首先编写一个将Q字符串转换为日期时间格式的函数（我将其稍微调整为最终格式）：

def convert_q_string(element):
    quarter_convert = {'1':'03-31', '2':'06-30', '3':'08-31', '4':'12-30'}
    element = element[1:]
    quarter = element[0]
    year = element[1:]
    daymonth = ''.join(str(quarter_convert.get(word, word)) for word in quarter)
    final = '20' + year + '-' + daymonth
    return final

We can now use this to first convert all 'Q'-strings, and then pd.to_datetime to convert all elements to proper datetime values: 现在，我们可以使用它首先转换所有“ pd.to_datetime ”字符串，然后使用pd.to_datetime将所有元素转换为正确的datetime值：

In [2]: s = pd.Series(['2017-03-01 00:00:00', 'Q217', '2018'])

In [3]: mask = s.str.startswith('Q')

In [4]: s[mask] = s[mask].map(convert_q_string)

In [5]: s
Out[5]: 
0    2017-03-01 00:00:00
1             2017-06-30
2                   2018
dtype: object

In [6]: pd.to_datetime(s)
Out[6]: 
0   2017-03-01
1   2017-06-30
2   2018-01-01
dtype: datetime64[ns]

处理熊猫数据框中的多种日期格式

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-24 14:56:30

处理熊猫数据框中的多种日期格式

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-24 14:56:30

解决方案1
1 已采纳 2017-11-24 14:56:30