[英]match a String based on regex pattern matching scala
我寫了以下正則表達式:
val reg = ".+([A-Z_].+).(\\d{4})_(\\d{2})_(\\d{2})_(\\d{2})\\.orc".r
應該解析以下字符串:“ S3 // bucket // TS11_YREDED.2018_09_28_02.orc”的解析方法是:
val dataExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(filename, year, month, day) =>
Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
case _ => Map(FILE_NAME-> filename,YEAR -> "", MONTH -> "", DAY -> "")
}
}
}
val YEAR = "YEAR"
val MONTH = "MONTH"
val DAY = "DAY"
val FILE_NAME = "FILE_NAME"
但是它不能正常工作,應該忽略存儲桶名稱並解析文件名和日期
所以預期的輸出應該是:Map(FILE_NAME-> TS11_YREDED,YEAR->,MONTH-> 09,DAY-> 28)請問如何解決它?
.+
模式部分首先匹配整個字符串,而([A-Z_].+)
僅捕獲要由后續模式捕獲並匹配的內容。
您可以使用
"""(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r
觀看此正則表達式演示
請注意,必須對點進行轉義以匹配文字點。
細節
(?:.*/)?
-除換行符以外的任何0+個字符,盡可能多,直到最后一個/
包括 (.*)
-捕獲組1:盡可能多的0+個字符(換行符除外) \\.
-一個點 (\\d{4})
-捕獲組2:四位數 _
下划線 (\\d{2})
-捕獲組3:兩位數字 _
下划線 (\\d{2})
-捕獲組4:兩位數字 _\\d{2}\\.orc
_
,2位數字, .
和orc
在字符串的末尾。 Scala演示 :
val text = "S3//bucket//TS11_YREDED.2018_09_28_02.orc"
val reg = """(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r
var YEAR = "YEAR"
var MONTH = "MONTH"
var DAY = "DAY"
var FILE_NAME = "FILE_NAME"
val dataExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(filename, year, month, day) =>
Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
case _ => Map(FILE_NAME-> FILE_NAME,YEAR -> YEAR, MONTH -> MONTH, DAY -> DAY)
}
}
}
println(dataExtraction(text))
// => Map(FILE_NAME -> TS11_YREDED, YEAR -> 2018, MONTH -> 09, DAY -> 28)
由於您沒有使用最后一個捕獲組,因此可以從模式中將其省略。
看一下這個:
val file_name = "TS11_YREDED.2018_09_28_02.orc"
val reg = """(.*?)\.(\d{4})_(\d{2})_(\d{2})_(\d{2})\.orc""".r
var file_details = scala.collection.mutable.ArrayBuffer[String]()
reg.findAllIn(file_name).matchData.foreach( m => file_details.appendAll(m.subgroups))
val names=Array("FILE_NAME","YEAR","MONTH","DAY","DUMMY")
for( (x,y) <- names.zip(file_details).toMap)
println(x + "->" + y)
//DUMMY->02
//DAY->28
//FILE_NAME->TS11_YREDED
//MONTH->09
//YEAR->2018
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.