简体   繁体   English

Powershell带下划线的字符串排序

[英]Powershell Sort of Strings with Underscores

The following list does not sort properly (IMHO): 以下列表未正确排序(恕我直言):

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
$a | sort
ABC_
ABCA
ABCZ

My handy ASCII chart and Unicode C0 Controls and Basic Latin chart have the underscore (low line) with an ordinal of 95 (U+005F). 我方便的ASCII图表和Unicode C0控件和基本拉丁图表的下划线(低线)的序数为95(U + 005F)。 This is a higher number than the capital letters AZ. 这是一个比大写字母AZ更高的数字。 Sort should have put the string ending with an underscore last. 排序应该将字符串以下划线结尾。

Get-Culture is en-US Get-Culture是en-US

The next set of commands does what I expect: 下一组命令符合我的期望:

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABCA
ABCZ
ABC_

Now I create an ANSI encoded file containing those same 3 strings: 现在我创建一个包含相同3个字符串的ANSI编码文件:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ

Once more the string containing the underscore/lowline is not sorted correctly. 包含下划线/下线的字符串不再正确排序。 What am I missing? 我错过了什么?


Edit: 编辑:

Let's reference this example #4: 让我们参考这个例子#4:

'A' -lt '_'
False
[char] 'A' -lt [char] '_'
True

Seems like both statements should be False or both should be True. 似乎两个语句都应该为False或两者都应为True。 I'm comparing strings in the first statement, and then comparing the Char type. 我在第一个语句中比较字符串,然后比较Char类型。 A string is merely a collection of Char types so I think the two comparison operations should be equivalent. 字符串只是Char类型的集合,所以我认为两个比较操作应该是等效的。

And now for example #5: 现在例如#5:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
$b = @( 'ABCZ', 'ABC_', 'ABCA' )
$a[0] -eq $b[0]; $a[1] -eq $b[1]; $a[2] -eq $b[2];
True
True
True
[System.Collections.ArrayList] $al = $a
[System.Collections.ArrayList] $bl = $b
$al[0] -eq $bl[0]; $al[1] -eq $bl[1]; $al[2] -eq $bl[2];
True
True
True
$al.Sort( [System.StringComparer]::Ordinal )
$bl.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ
$bl
ABCA
ABCZ
ABC_

The two ArrayList contain the same strings, but are sorted differently. 两个ArrayList包含相同的字符串,但排序方式不同。 Why? 为什么?

In many cases PowerShell wrap/unwrap objects in/from PSObject . 在许多情况下,PowerShell在PSObject包装/解包对象。 In most cases it is done transparently, and you does not even notice this, but in your case it is what cause your trouble. 在大多数情况下,它是透明地完成的,你甚至没有注意到这一点,但在你的情况下,这是导致你麻烦的原因。

$a='ABCZ', 'ABC_', 'ABCA'
$a|Set-Content data.txt
$b=Get-Content data.txt

[Type]::GetTypeArray($a).FullName
# System.String
# System.String
# System.String
[Type]::GetTypeArray($b).FullName
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject

As you can see, object returned from Get-Content are wrapped in PSObject , that prevent StringComparer from seeing underlying strings and compare them properly. 如您所见,从Get-Content返回的对象包装在PSObject ,这阻止了StringComparer查看底层字符串并正确比较它们。 Strongly typed string collecting can not store PSObject s, so PowerShell will unwrap strings to store them in strongly typed collection, that allows StringComparer to see strings and compare them properly. 强类型字符串收集不能存储PSObject ,因此PowerShell将解包字符串以将它们存储在强类型集合中,这允许StringComparer查看字符串并正确比较它们。

Edit: 编辑:

First of all, when you write that $a[1].GetType() or that $b[1].GetType() you does not call .NET methods, but PowerShell methods, which normally call .NET methods on wrapped object. 首先,当您编写$a[1].GetType()$b[1].GetType()您不会调用.NET方法,而是调用PowerShell方法,这些方法通常在包装对象上调用.NET方法。 Thus you can not get real type of objects this way. 因此,您无法以这种方式获得真实类型的对象。 Even more, them can be overridden, consider this code: 更重要的是,它们可以被覆盖,请考虑以下代码:

$c='String'|Add-Member -Type ScriptMethod -Name GetType -Value {[int]} -Force -PassThru
$c.GetType().FullName
# System.Int32

Let us call .NET methods thru reflection: 让我们通过反射调用.NET方法:

$GetType=[Object].GetMethod('GetType')
$GetType.Invoke($c,$null).FullName
# System.String
$GetType.Invoke($a[1],$null).FullName
# System.String
$GetType.Invoke($b[1],$null).FullName
# System.String

Now we get real type for $c , but it says that type of $b[1] is String not PSObject . 现在我们得到$c实数类型,但它说$b[1]String而不是PSObject As I say, in most cases unwrapping done transparently, so you see wrapped String and not PSObject itself. 正如我所说,在大多数情况下,展开透明地完成,所以你看到包装的String而不是PSObject本身。 One particular case when it does not happening is that: when you pass array, then array elements are not unwrapped. 没有发生的一个特殊情况是:当您传递数组时,数组元素不会被解包。 So, let us add additional level of indirection here: 那么,让我们在这里添加更多级别的间接:

$Invoke=[Reflection.MethodInfo].GetMethod('Invoke',[Type[]]([Object],[Object[]]))
$Invoke.Invoke($GetType,($a[1],$null)).FullName
# System.String
$Invoke.Invoke($GetType,($b[1],$null)).FullName
# System.Management.Automation.PSObject

Now, as we pass $b[1] as part of array, we can see real type of it: PSObject . 现在,当我们将$b[1]作为数组的一部分传递时,我们可以看到它的真实类型: PSObject Although, I prefer to use [Type]::GetTypeArray instead. 虽然,我更喜欢使用[Type]::GetTypeArray

About StringComparer : as you can see , when not both compared objects are strings, then StringComparer rely on IComparable.CompareTo for comparison. 关于StringComparer如您所见 ,当两个比较对象都不是字符串时, StringComparer依赖IComparable.CompareTo进行比较。 And PSObject implement IComparable interface, so that sorting will be done according to PSObject IComparable implementation. PSObject实现IComparable接口,这样就可以根据PSObject IComparable实现排序。

Windows uses Unicode, not ASCII, so what you're seeing is the Unicode sort order for en-US. Windows使用的是Unicode,而不是ASCII,因此您所看到的是en-US的Unicode排序顺序。 The general rules for sorting are: 排序的一般规则是:

  1. numbers, then lowercase and uppercase intermixed 数字,然后小写和大写混合
  2. Special characters occur before numbers. 特殊字符出现在数字之前。

Extending your example, 扩展你的例子,

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ABC4', 'abca' )

$a | sort-object
ABC_
ABC4
abca
ABCA
ABCZ

If you really want to do this.... I will admit it's ugly but it works. 如果你真的想这样做......我会承认它很难看,但它确实有效。 I would create a function if this is something you need to do on a regular basis. 如果这是你需要定期做的事情,我会创建一个函数。

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ab1z' ) $ascii = @() $ a = @('ABCZ','ABC_','ABCA','ab1z')$ ascii = @()

foreach ($item in $a) { $string = "" for ($i = 0; $i -lt $item.length; $i++) { $char = [int] [char] $item[$i] $string += "$char;" foreach($ item in $ a){$ string =“”for($ i = 0; $ i -lt $ item.length; $ i ++){$ char = [int] [char] $ item [$ i] $ string + =“$ char;” } }

$ascii += $string
}

$b = @() $ b = @()

foreach ($item in $ascii | Sort-Object) { $string = "" $array = $item.Split(";") foreach ($char in $array) { $string += [char] [int] $char } foreach($ item in $ ascii | Sort-Object){$ string =“”$ array = $ item.Split(“;”)foreach($ array in $ array){$ string + = [char] [int] $ char}

$b += $string
}

$a $b $ a $ b

ABCA ABCZ ABC_ ABCA ABCZ ABC_

我尝试了以下操作,排序符合预期:

[System.Collections.ArrayList] $al = [String[]] $a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM