简体   繁体   English

如何从pyspark中的字符串中提取时间

[英]How to extract time from string in pyspark

I have a string that contains time in the following pattern that I want to extract in pyspark 我有一个包含要在pyspark中提取的以下模式中的时间的字符串

......&eventTime=2017-02-22T01%3a02%3a07.1816943Z&......

This is what I tried but didn't work; 这是我尝试过的,但是没有用; df_event.EventParameters is a column which contains the time. df_event.EventParameters是包含时间的列。

df_localTime = pyspark.sql.functions \
          .regexp_extract(df_event.EventParameters, '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\{3}).*', 1) \
          .alias('localTime')

The thing that prevents it from matching anything is this part \\.\\{3} 阻止其匹配任何内容的是此部分\\.\\{3}

It basically says 它基本上说

\. match a literal dot  
\{ match a literal open brace  
3 match a literal three  
} match a literal close brace  

I assume you meant there to be a \\d instead. 我认为您的意思是要有一个\\d来代替。
\\.\\d{3}

So, the stringed regex is now '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*' 因此,字符串正则表达式现在为'.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*'

which now matches this ( group 1 is highlighted ) 现在与此匹配( 第1组突出显示

......&eventTime= 2017-02-22T01%3a02%3a07.181 6943Z&...... ......&eventTime = 2017-02-22T01%3a02%3a07.181 6943Z&......

Formatted (for readability) 格式化(以提高可读性)

 .* 
 (                             # (1 start)
      \d{4} - \d{2} - \d{2} 
      T 
      \d{2} %3a \d{2} %3a \d{2} 
      \. \d{3} 
 )                             # (1 end)
 .* 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM