简体   繁体   中英

How can I combine rows from spark databricks

I'm trying to combine rows in Spark.

The dataset has rows of Year, Zip code, HPI_with_2000_based, etc. I selected three zip codes and their information of HPI_with_2000_based. What I want to do is I want to combine these rows(three zip codes and their HPI_with_2000_based) and Year after 2000.

When I typed like this and it worked:

df6 = spark.sql("select ZipCode,Year, HPI_with_2000_base from df1 where ZipCode = 94122 or ZipCode = 10583 or ZipCode = 91411")

Resulting dataframe:

+-------+----+------------------+
|ZipCode|Year|HPI_with_2000_base|
+-------+----+------------------+
|  10583|1976|             16.66|
|  10583|1977|             16.81|
|  10583|1978|             18.37|
|  10583|1979|             23.06|
|  10583|1980|             24.37|
|  10583|1981|             30.82|
|  10583|1982|             32.46|
|  10583|1983|             35.25|
|  10583|1984|             42.15|
|  10583|1985|             48.94|
|  10583|1986|             57.22|
|  10583|1987|             66.24|
|  10583|1988|             76.98|
|  10583|1989|             77.28|
|  10583|1990|             74.44|
|  10583|1991|             69.85|
|  10583|1992|             70.86|
|  10583|1993|             70.98|
|  10583|1994|             71.39|
|  10583|1995|             71.27|
+-------+----+------------------+
only showing top 20 rows

However, when I typed like this, it failed:

df6 = spark.sql("select ZipCode,Year, HPI_with_2000_base from df1 where ZipCode = 94122 or ZipCode = 10583 or ZipCode = 91411" or Year >= '2000'").show()

Can you advise that what should I do to get a result? Thank you.

If I understand the question correctly, you want to add the condition Year >= 2000 to the current SQL statement. Your " seems a bit misplaced and you need to surround the ZipCode or ZipCode or ZipCode part with parenthesis. A working statement can look like this:

val df6 = spark.sql("""select ZipCode, Year, HPI_with_2000_base from df1 
                         where ZipCode IN(94122, 10583, 91411) and Year >= 2000""")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM