如何从pyspark中的字符串中提取时间

Question

I have a string that contains time in the following pattern that I want to extract in pyspark 我有一个包含要在pyspark中提取的以下模式中的时间的字符串

......&eventTime=2017-02-22T01%3a02%3a07.1816943Z&......

This is what I tried but didn't work; 这是我尝试过的，但是没有用； df_event.EventParameters is a column which contains the time. df_event.EventParameters是包含时间的列。

df_localTime = pyspark.sql.functions \
          .regexp_extract(df_event.EventParameters, '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\{3}).*', 1) \
          .alias('localTime')

Answer 1

The thing that prevents it from matching anything is this part \\.\\{3} 阻止其匹配任何内容的是此部分\\.\\{3}

It basically says 它基本上说

\. match a literal dot  
\{ match a literal open brace  
3 match a literal three  
} match a literal close brace

I assume you meant there to be a \\d instead. 我认为您的意思是要有一个\\d来代替。
\\.\\d{3}

So, the stringed regex is now '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*' 因此，字符串正则表达式现在为'.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*'

which now matches this ( group 1 is highlighted ) 现在与此匹配（ 第1组突出显示 ）

......&eventTime= 2017-02-22T01%3a02%3a07.181 6943Z&...... ......＆eventTime = 2017-02-22T01%3a02%3a07.181 6943Z＆......

Formatted (for readability) 格式化（以提高可读性）

 .* 
 (                             # (1 start)
      \d{4} - \d{2} - \d{2} 
      T 
      \d{2} %3a \d{2} %3a \d{2} 
      \. \d{3} 
 )                             # (1 end)
 .*

如何从pyspark中的字符串中提取时间

问题描述

1 个解决方案

解决方案1
2 已采纳

如何从pyspark中的字符串中提取时间

问题描述

1 个解决方案

解决方案1 2 已采纳

解决方案1
2 已采纳