[英]Finding difference between two time stamp in pyspark sql
Below table structure, you can notice the column name 在表格结构下方,您会注意到列名
cal_avg_latency = spark.sql("SELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM `SFSC_Incident_Census_view` WHERE EXTRACT(DATE from ReceivedDtTmTS) == EXTRACT(DATE from OnSceneDtTmTS) GROUP BY UnitType ORDER BY latency ASC")
Error: 错误:
ParseException: "\nmismatched input 'FROM' expecting <EOF>(line 1, pos 122)\n\n== SQL ==\nSELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM SFSC_Incident_Census_view WHERE EXTRACT((DATE FROM ReceivedDtTmTS) == EXTRACT(DATE FROM OnSceneDtTmTS)) GROUP BY UnitType ORDER BY latency ASC\n--------------------------------------------------------------------------------------------------------------------------^^^\n"
Error is in WHERE condition but even my TIMESTAMP_DIFF function not working 错误处于WHERE状态,但即使我的TIMESTAMP_DIFF函数也不起作用
cal_avg_latency = spark.sql("SELECT UnitType, ROUND(AVG(TIMESTAMP_DIFF(OnSceneDtTmTS, ReceivedDtTmTS, MINUTE)), 2) as latency, count(*) as total_count FROM SFSC_Incident_Census_view GROUP BY UnitType ORDER BY latency ASC")
Error : 错误:
AnalysisException: "Undefined function: 'TIMESTAMP_DIFF'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 27"
The error message seems pretty clear. 该错误消息似乎很清楚。 Hive doesn't have a TIMESTAMP_DIFF
function. 蜂巢没有TIMESTAMP_DIFF
函数。
If your columns are already appropriately cast as a timestamp
type, you can subtract them directly. 如果您的列已经适当地转换为timestamp
类型,则可以直接将其减去。 Otherwise, you can cast them explicity, and take the difference: 否则,您可以将其显式转换,并加以区别:
SELECT ROUND(AVG(MINUTE(CAST(OnSceneDtTmTS AS timestamp) - CAST(ReceivedDtTmTS AS timestamp))), 2) AS latency
I have solve the problem using pyspark query. 我已经使用pyspark查询解决了问题。
from pyspark.sql import functions as F
import pyspark.sql.functions as func
timeFmt = "yyyy-MM-dd'T'HH:mm:ss.SSS"
timeDiff = (F.unix_timestamp('OnSceneDtTmTS', format=timeFmt)
- F.unix_timestamp('ReceivedDtTmTS', format=timeFmt))
FSCDataFrameTsDF = FSCDataFrameTsDF.withColumn("Duration", timeDiff)
#convert seconds to minute and round the seconds for further use.
FSCDataFrameTsDF = FSCDataFrameTsDF.withColumn("Duration_minutes",func.round(FSCDataFrameTsDF.Duration / 60.0))
Output: 输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.