[英]How to filter the data from Rdd and save it to text file using scala in spark
val dfTsv1 = spark.read.format("com.databricks.spark.csv")
.option("delimiter", "\t")
.load("filepath1")
val dfTsv2 = spark.read.format("com.databricks.spark.csv")
.option("delimiter", "\t").load("filepath2")
val duplicateColumns = List("") // put your duplicate column names here
val outputDf = dfTsv1.alias("tcv1").join(dfTsv2.alias("tcv2"),dfTsv1("ACCESSED_MONTH") === dfTsv1("ACCESSED_MONTH"))
.drop(duplicateColumns: _*)
outputDf.show()
交集只不過是內連接,只需對兩個 Dataframe 執行內連接操作即可。 參考Spark SQL 連接
val df = df1.join(df2, Seq("APP_NAME"), "inner")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.