[英]How to parse an XML node having key-value pair using spark scala
誰能幫我過濾所有帶有event_id = 100的記錄,並獲得持續時間最長的前五名設備(此處的持續時間值為xml格式)
列/字段名稱是
服務器唯一標識請求類型事件標識時間戳XML,帶有名稱和值標簽的文件中的設備ID輔助時間戳記錄類似於:
11001 ^ 1 ^ 100 ^ 2015-06-05 22:35:45.927 ^^ 0122648d-4352-4eec-9327-effae0c34ef2 ^ 2016060601
這是我編寫並粘貼以獲取caseclass值的代碼
object RecEventId100 extends App{
System.setProperty("hadoop.home.dir", "/home/hp/hadoop-2.5.0-cdh5.3.2")
System.setProperty("spark.sql.warehouse.dir", "file:/home/hp/spark/spark-warehouse")
val spark = SparkSession.builder.appName("AvgAnsTime").master("local").getOrCreate()
val data = spark.read.textFile("/home/hp/Veeresh_data/sparkScala programs/Set_up_Box/Set_Top_Box_Data_test.txt").rdd
val result = data.filter{lines => {lines.split("\\^")(2).equals("100")}}
val res = result .map{lines =>{val tokens = lines.split("\\^")
(tokens(5),tokens(4))
}}
val parseXML = res.map{rec =>{
val xml = XML.loadString(rec._2)
(rec._1, xml)
}}
case class keyValCls (a : String, b : String)
val getD = parseXML.map{line => {
val items = line._2 \ "nv"
val durationKey = items.map(i => i \ "@n")
val durationVal = items.map(i => i \ "@v")
val length = durationKey.length - 1
for(len <- 0 to length )
if(durationKey(len).toString().equals("Duration")) {
println("Inside Duration - "+durationKey(len)+ " Val - "+durationVal(len))
val cc = keyValCls(durationVal(len).toString(), line._1))
println("case class Duration - "+cc.a+ " case class deviceID - "+cc.b)
}
keyValCls
}}
}
無需使用案例類並在案例類之后進行映射。
我已經在您的代碼中重新編寫了一個代碼段(案例類之前的地圖)
val parseXML = res.map{rec =>{
val xml = XML.loadString(rec._2)
val v=((xml \ "nv").filter(node => (node \@ "n") == "Duration")) \@ "v"
(rec._1, v)
}}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.