[英]How to extract time from string in pyspark
I have a string that contains time in the following pattern that I want to extract in pyspark 我有一个包含要在pyspark中提取的以下模式中的时间的字符串
......&eventTime=2017-02-22T01%3a02%3a07.1816943Z&......
This is what I tried but didn't work; 这是我尝试过的,但是没有用; df_event.EventParameters is a column which contains the time. df_event.EventParameters是包含时间的列。
df_localTime = pyspark.sql.functions \
.regexp_extract(df_event.EventParameters, '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\{3}).*', 1) \
.alias('localTime')
The thing that prevents it from matching anything is this part \\.\\{3}
阻止其匹配任何内容的是此部分\\.\\{3}
It basically says 它基本上说
\. match a literal dot
\{ match a literal open brace
3 match a literal three
} match a literal close brace
I assume you meant there to be a \\d
instead. 我认为您的意思是要有一个\\d
来代替。
\\.\\d{3}
So, the stringed regex is now '.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*'
因此,字符串正则表达式现在为'.*(\\d{4}-\\d{2}-\\d{2}T\\d{2}%3a\\d{2}%3a\\d{2}\\.\\d{3}).*'
which now matches this ( group 1 is highlighted ) 现在与此匹配( 第1组突出显示 )
......&eventTime= 2017-02-22T01%3a02%3a07.181
6943Z&...... ......&eventTime = 2017-02-22T01%3a02%3a07.181
6943Z&......
Formatted (for readability) 格式化(以提高可读性)
.*
( # (1 start)
\d{4} - \d{2} - \d{2}
T
\d{2} %3a \d{2} %3a \d{2}
\. \d{3}
) # (1 end)
.*
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.