简体   繁体   English

如何根据过滤器选择特定的子串长度

[英]How to select specific substring length based on a filter

I have multiple CSV files with different names, containing today's date, customer number, and then the extension. 我有多个具有不同名称的CSV文件,包含今天的日期,客户编号,然后是扩展名。 For example: 例如:

2019-01-23 XYZF-105.csv
2019-01-23 ABCD-205.csv
2019-01-23 Different nonstandard name.csv
2019-01-23 ##ABCD-305(Trial).csv

I would like to get the part of the name where it contains the customer number only, like ABCD-305. 我想获得名称中仅包含客户编号的部分,如ABCD-305。

Tried using a substring to select 8 characters right from the dot, but that doesn't work for those that have suffix like (Trial). 尝试使用子字符串从点中选择8个字符,但这对于具有后缀的那些(试用)不起作用。 Neither it work 11 characters from the beginning, as it will include the ##. 它从一开始就不会起作用11个字符,因为它将包含##。 Also, it has to avoid the nonstandard names. 此外,它必须避免非标准的名称。

I used 我用了

$allitems = Get-ChildItem -Path 'C:\Downloads\Customers\*.csv'
$res = @()
foreach ($item in $allitems){
    $item = $item.Name.substring($item.Name.Length - 12,8)
    $res += $Item
}

This way, for the proper names I get good results, but only if the name of the CSV is like 2019-01-23 ABCD-205.csv. 这样,对于正确的名称,我得到了很好的结果,但只有当CSV的名称类似于2019-01-23 ABCD-205.csv时。

What should be the way to skip the date, skip the .csv extension and get only results with 8 characters, that have a dash after the 4th character? 什么是跳过日期的方法,跳过.csv扩展名,只得到8个字符的结果,在第4个字符后有一个短划线? Thanks in advance 提前致谢

Try the following (PSv3+ syntax): 尝试以下(PSv3 +语法):

$res = (Get-ChildItem -Path C:\Downloads\Customers\*.csv).Name | 
         Select-String -CaseSensitive '\b[A-Z]{4}-\d{3}\b' |
           ForEach-Object { $_.Matches[0].Value }
  • (Get-ChildItem -Path C:\\Downloads\\Customers\\*.csv).Name outputs the file names of all CSV files in dir. (Get-ChildItem -Path C:\\Downloads\\Customers\\*.csv).Name输出dir中所有CSV文件的文件名。 C:\\Downloads\\Customers

  • Select-String -CaseSensitive '\\b[AZ]{4}-\\d{3}\\b' uses case-sensitive regex (regular-expression) matching to only select file names that contain 4 ( {4} ) uppercase chars. Select-String -CaseSensitive '\\b[AZ]{4}-\\d{3}\\b'使用区分大小写的正则表达式(正则表达式)匹配,仅选择包含4( {4} )个大写字符的文件名。 [AZ] , followed by - , followed by 3 digits ( \\d ), on word boundaries ( \\b ) [AZ] ,后跟- ,后跟3位数( \\d ),字边界( \\b

  • The ForEach-Object script block then outputs the part of each matching file name that matched the regex ( $_.Matches[0].Value ), so that only the relevant portions of matching file names are collected in $res , as an array. 然后ForEach-Object脚本块输出与正则表达式匹配的每个匹配文件名的一部分( $_.Matches[0].Value ),以便只在$res中收集匹配文件名的相关部分,作为数组。

This would be a good time to use regex. 这是使用正则表达式的好时机。 See https://regex101.com/r/AH00n6/1 请参阅https://regex101.com/r/AH00n6/1

and understand the following regex: 并了解以下正则表达式:

.*\s[#]*([A-Z]{4}-[0-9]{3}).*.csv

This is a little extra to capture just the names, but gives more insight into how to control the regex. 这只是一些额外的信息,只能捕获名称,但可以更深入地了解如何控制正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM