简体   繁体   中英

Regex to Select only Pattern Match Files

Ftp server has following files created on daily basis.

  • FGI_WTYUIO_D_2016_04_16_BS.daily.gzip - BS File
  • FGI_WTYUIO_D_2016_04_16_BV.daily.gzip - BV File
  • FGI_GHJK_D_2016_04_16_SATB3.daily.gzip - B3 File
  • FKI_GHJK_D_2016_04_16_SAT.daily.gzip - BV File
  • FKI_GHJK_D_2016_04_16_SATB3.daily.gzip - B3 File
  • FKI_GHJK_D_2016_04_16_SATBS.daily.gzip - BS File
  • FKI_GHJK_D_2016_04_16_SSD.daily.gzip - Need to Ignore
  • FKI_GHJK_D_2016_04_16_SSDBS.daily.gzip - Need to Ignore

So, basically there two filetypes

  • FGI
  • FKI

and Three Report code for each Filetypes

  • BS
  • BV
  • B3

I need to ignore rest of the files. (SSD files).

I need to write regex pattern inside Javascript to fetch these files. which has following variables.

  • fileDate - Date ex. 2016_04_16
  • matchReportCode - ex. BV,BS,B3

So, if fileDate = 2016_04_15 and matchReportCode='SV' (BS,BV). Then I should only fetch following files.

  • FGI_WTYUIO_D_2016_04_15_BS.daily.gzip - FGI BS File
  • FGI_WTYUIO_D_2016_04_15_BV.daily.gzip - FGI BV File
  • FKI_GHJK_D_2016_04_16_SAT.daily.gzip - FKI BV File
  • FKI_GHJK_D_2016_04_16_SATBS.daily.gzip - FKI BS File

So, if fileDate = 2016_04_19 and matchReportCode='3S' (B3,BS). Then I should only fetch following files.

  • FGI_WTYUIO_D_2016_04_15_BS.daily.gzip - FGI BS File
  • FGI_GHJK_D_2016_04_16_SATB3.daily.gzip - FGI B3 File
  • FKI_GHJK_D_2016_04_16_SATB3.daily.gzip - FKI B3 File
  • FKI_GHJK_D_2016_04_16_SATBS.daily.gzip - FKI BS File

So far I could only come up with this.

FileRegex = "F[KG]I_.*_D_" + fileDate + "_[A-z]{0,3}L{0,1}[" + matchReportCode + "]{0,1}.daily.gzip";

Can someone please help ? I am new to regex. Thanks.

The following may be a bit better:

FileRegex = "F[KG]I_[^_]+_D_" + fileDate + "_(?!SSD)[a-zA-Z]{0,3}((B[" + matchReportCode + "])|(?<^FKI.*)).daily.gzip";

This will match FKI and FGI file names that have the chosen fileDate and up to three letters preceding the chosen reportCode.

The other changes include changing [Az] , to [a-zA-Z] , this is because regex character class range expression uses the ascii representations, and there are characters ([- etc.) between A and z that are not alphabetical (which appears to be your intent).

Also .*_ became [^_]+_ , this requires that there be at least one character besides the underscore, prevents the engine from having to backtrack as much, as well as making the regex easier to edit if another segment is added.

I also added a negative lookahead (?!SSD) at the start of the last segment, which requires that the segment not start with SSD.

The or condition at the end ((B[" + matchReportCode + "])|(?<^FKI.*)) requires that either the file match the report code, or that the file name start with FKI (followed by any number of characters to get back to the end). The ^ is the start of line anchor when used outside of a character class ( [...] ).

You can use negative lookahead :

var fileDate = '2016_04_19';
var matchReportCode = 'BS';
var re = new RegExp('F[KG]I_\\w+_D_' + fileDate + 
  '_(?!SSD)[\\w\\d]*' + matchReportCode + '?\\.daily\\.gzip');

// re.test('FKI_GHJK_D_2016_04_19_SSDBS.daily.gzip') === false
// re.test('FKI_GHJK_D_2016_04_19_SATBS.daily.gzip') === true

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM