簡體   English   中英

根據正則表達式模式匹配scala匹配字符串

[英]match a String based on regex pattern matching scala

我寫了以下正則表達式:

val reg = ".+([A-Z_].+).(\\d{4})_(\\d{2})_(\\d{2})_(\\d{2})\\.orc".r 

應該解析以下字符串:“ S3 // bucket // TS11_YREDED.2018_09_28_02.orc”的解析方法是:

val dataExtraction: String => Map[String, String] = {
  string: String => {
    string match {
      case reg(filename, year, month, day) =>
                 Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
      case _  => Map(FILE_NAME-> filename,YEAR -> "", MONTH -> "", DAY -> "")
    }
  }
}
val YEAR = "YEAR"
val MONTH = "MONTH"
val DAY = "DAY"
val FILE_NAME = "FILE_NAME"

但是它不能正常工作,應該忽略存儲桶名稱並解析文件名和日期

所以預期的輸出應該是:Map(FILE_NAME-> TS11_YREDED,YEAR->,MONTH-> 09,DAY-> 28)請問如何解決它?

.+模式部分首先匹配整個字符串,而([A-Z_].+)僅捕獲要由后續模式捕獲並匹配的內容。

您可以使用

"""(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r

觀看此正則表達式演示

請注意,必須對點進行轉義以匹配文字點。

細節

  • (?:.*/)? -除換行符以外的任何0+個字符,盡可能多,直到最后一個/包括
  • (.*) -捕獲組1:盡可能多的0+個字符(換行符除外)
  • \\. -一個點
  • (\\d{4}) -捕獲組2:四位數
  • _下划線
  • (\\d{2}) -捕獲組3:兩位數字
  • _下划線
  • (\\d{2}) -捕獲組4:兩位數字
  • _\\d{2}\\.orc _ ,2位數字, . orc在字符串的末尾。

Scala演示

val text = "S3//bucket//TS11_YREDED.2018_09_28_02.orc"
val reg = """(?:.*/)?(.*)\.(\d{4})_(\d{2})_(\d{2})_\d{2}\.orc""".r

var YEAR = "YEAR"
var MONTH = "MONTH"
var DAY = "DAY"
var FILE_NAME = "FILE_NAME"

val dataExtraction: String => Map[String, String] = {
  string: String => {
    string match {
      case reg(filename, year, month, day) =>
                 Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
      case _  => Map(FILE_NAME-> FILE_NAME,YEAR -> YEAR, MONTH -> MONTH, DAY -> DAY)
    }
  }
}

println(dataExtraction(text))
// => Map(FILE_NAME -> TS11_YREDED, YEAR -> 2018, MONTH -> 09, DAY -> 28)

由於您沒有使用最后一個捕獲組,因此可以從模式中將其省略。

看一下這個:

val file_name = "TS11_YREDED.2018_09_28_02.orc"
val reg = """(.*?)\.(\d{4})_(\d{2})_(\d{2})_(\d{2})\.orc""".r
var file_details = scala.collection.mutable.ArrayBuffer[String]()
reg.findAllIn(file_name).matchData.foreach( m => file_details.appendAll(m.subgroups))
val names=Array("FILE_NAME","YEAR","MONTH","DAY","DUMMY")
for( (x,y) <- names.zip(file_details).toMap)
  println(x + "->" + y)

//DUMMY->02
//DAY->28
//FILE_NAME->TS11_YREDED
//MONTH->09
//YEAR->2018

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM