![](/img/trans.png)
[英]How to map a RDD of type org.apache.spark.rdd.RDD[Array[String]]?
[英]Scala-How to filter an RDD org.apache.spark.rdd.RDD[String]]
我有一個RDD,需要按價格進行過濾。 這是rdd
id category_id product_name price
1 2 Quest Q64 10 FT. x 10 FT. Slant Leg Instant U 59.98
2 2 Under Armour Men's Highlight MC Football Clea 129.99
3 2 Under Armour Men's Renegade D Mid Football Cl 89.99
4 2 Under Armour Men's Renegade D Mid Football Cl 89.99
5 2 Riddell Youth Revolution Speed Custom Footbal 199.99
6 2 Jordan Men's VI Retro TD Football Cleat 134.99
7 2 Schutt Youth Recruit Hybrid Custom Football H 99.99
8 2 Nike Men's Vapor Carbon Elite TD Football Cle 129.99
9 2 Nike Adult Vapor Jet 3.0 Receiver Gloves 50.0
我收到以下錯誤
scala> val rdd2 = rdd1.map(.split("\t")).map(c => c(3) < 100)
<console>:44: error: type mismatch; found : Int(100) required: String val rdd2 = rdd1.map(.split("\t")).map(c => c(3) < 100)
df.printSchema()
root |-- id: integer (nullable = true)
|-- category_id: integer (nullable = true)
|-- product_name: string (nullable = true)
|-- price: double (nullable = true)
|-- image: string (nullable = true)
給定df.printSchema()
,您可以使用查詢列price
來過濾表
df.filter(df.col("price") < 100).show
您可以簡單地使用sparkContext.textfile
讀取文件並進行如下計算
val rdd1 = sparkSession.sparkContext.textFile("text file location")
val rdd2 = rdd1.map(_.split("\t")).filter(c => !"price".equalsIgnoreCase(c(3).trim)).filter(c => c(3).toDouble < 100)
如果您已經有一個dataframe
則無需將它們轉換回rdd
進行計算。 你可以filter
dataframe
本身
val finaldf = df.filter($"price" =!= "price").filter($"price".cast(DoubleType) < 100)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.