[英]How to do an sql jointure on spark?
I would like to do an SQL joint between two tables in spark, and I got an unexpected error : 我想在spark中的两个表之间做一个SQL联接,我遇到了一个意外的错误:
>>> cyclistes.printSchema()
root
|-- id: string (nullable = true)
|-- age: string (nullable = true)
(...)
>>> voyages.printSchema()
root
|-- id: string (nullable = true)
|-- vitesse: string (nullable = true)
(...)
>>> requete_sql = """
SELECT c.id, c.age, mean(v.vitesse)
FROM cyclistes as c , voyages as v
WHERE c.id == v.id
GROUP BY c.id
"""
>>> spark.sql(requete_sql)
AnalysisException: "grouping expressions sequence is empty, and
'c.`age`' is not an aggregate function. Wrap '(avg(CAST(v.`vitesse`
AS DOUBLE)) AS `avg(CAST(vitesse AS DOUBLE))`)' in windowing
function(s) or wrap 'c.`age`' in first() (or first_value) if you
don't care which value you get.;
Any idea ? 任何想法 ?
Basic error in the SQL query : should be added a max around age : SQL查询中的基本错误:应添加最大年龄:
>>> requete_sql = """
SELECT c.id, max(c.age), mean(v.vitesse)
FROM cyclistes as c , voyages as v
WHERE c.id == v.id GROUP BY c.id """
>>> spark.sql(requete_sql)
Basic error in the SQL query : should be added a max around age : SQL查询中的基本错误:应添加最大年龄:
>>> requete_sql = """
SELECT c.id, max(c.age), mean(v.vitesse)
FROM cyclistes as c , voyages as v
WHERE c.id == v.id GROUP BY c.id """
>>> spark.sql(requete_sql)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.