I have a dataframe
Hi,I have a dataframe as below
+-------+--------+
|id |level |
+-------+--------+
| 0 | 0 |
| 1 | 0 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 0 |
| 9 | 1 |
| 10 | 0 |
+-------+--------+
and I need the sum of consecutive 1's .SO the output should be 3,2,1.However the constraint in this scenario is that i do not need to use UDF Is there any in-built scala/spark function that can do this trick.I am not able to USE UDF
You could use row_number
and count
( SQL/Dataframe API
), to count the number of consecutive values (repeat) in a column. The trick is to count the offset between the current row and the index of the occurrence of the consecutive targeted values.
var df = spark.createDataFrame(Seq((0,0),(1,0),(2,1),(3,1),(4,1),(5,0),(6,1),(7,1),(8,0),(9,1),(10,0))).toDF("id","level")
df.createOrReplaceTempView("DT")
var df_cnt = spark.sql("select level, count(*) from (select *, (row_number() over (order by id) - row_number() over (partition by level order by id) ) as grp from DT order by id) as t where level !=0 group by grp, level ")
df_cnt.show()
The sequence of id must be maintained otherwise it will produce the wrong result.
df = spark.createDataFrame([(0,0),(1,0),(2,1),(3,1),(4,1),(5,0),(6,1),(7,1),(8,0),(9,1),(10,0)]).toDF("id","level")
df.createOrReplaceTempView('DF')
//same as before with spark.sql(...)
select level, count(*) from
(select *,
(row_number() over (order by id) -
row_number() over (partition by level order by id)
) as grp
from SDF order by id) as t
where level !=0
group by grp, level
You could do something like this:
val seq = Seq(0,0,1,1,1,0,1,1,0,1,0)
val seq1s = seq.foldLeft("")(_ + _).split("0")
seq1s.map(_.sliding(1).count(_ == "1"))
res: Array[Int] = Array(0, 0, 3, 2, 1)
If you don´t want the 0s there you could just filter them out using this instead:
seq1s.map(_.sliding(1).count(_ == "1")).filterNot(_ == 0)
res: Array[Int] = Array(3, 2, 1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.