[英]spark scala filter using placeholder syntax
I have following source file 我有以下源文件
id,name,year,rating,duration
1. The Nightmare Before Christmas,1993,3.9,4568
2. The Mummy,1932,3.5,4388
3. Orphans of the Storm,1921,3.2,9062
4. The Object of Beauty,1991,2.8,6150
5. Night Tide,1963,2.8,5126
6. One Magic Christmas,1985,3.8,5333
I am trying to filter
all rows where year=2012
and the following works. 我正在尝试
filter
year=2012
和以下作品的所有行。
c.map(_.split(",")).filter(x=>x(2).toInt==2012)
But how can I achieve the same using placeholder
syntax ( _
) ??? 但是如何使用
placeholder
语法( _
)达到相同的效果?
I could use placeholder
syntax ( _
) in map
function (eg rdd.map((_.split(",")) )
我可以在
map
功能中使用placeholder
语法( _
)(例如rdd.map((_.split(",")) )
Please advice. 请指教。
Is this what you are looking for 这是你想要的
c.map(_.split(",")).filter(_(2).toInt==2012)
But I suggest you to use Spark-CSV to read a csv file like 但我建议您使用Spark-CSV读取csv文件,例如
val df1 = spark.read.option("inferSchema", true)
.option("header",true)
.option("delimiter", ",")
.csv("data1.csv")
and then you can filter easily as 然后您可以轻松地过滤为
df1.filter($"year" === "2012")
Hope this helps 希望这可以帮助
You can use placeholder by simply doing the following 您只需执行以下操作即可使用占位符
c.map(_.split(",")).filter(_(2).toInt==2012).map(_.toSeq).foreach(println)
But I would suggest you to go with case class if you know your data is of fixed length 但是如果您知道数据长度固定,我建议您使用案例类
case class row(id: String,
name: String,
year: String,
rating: String,
duration: String)
and you can use it as 您可以将其用作
c.map(_.split(",", -1)).map(array => row(array(0),array(1),array(2),array(3),array(4))).filter(x => x.year.toInt == 2012).foreach(println)
Or to be on the safe side you can combine Option
as 为了安全起见,您可以将
Option
组合为
c.map(_.split(",", -1)).map(array => {
row(Option(array(0)) getOrElse "",
Option(array(1)) getOrElse "",
Option(array(2)) getOrElse "",
Option(array(3)) getOrElse "",
Option(array(4)) getOrElse "")
})
.filter(x => x.year.toInt == 2012)
.foreach(println)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.