简体   繁体   English

通过忽略(不替换)非字母数字字符或查看第一个字母数字字符对字符串列表进行排序

[英]Sorting a list of strings by ignoring (not replacing) non-alphanumeric characters, or by looking at the first alphanumeric character

Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.基本上,我需要根据非常具体的标准对字符串列表进行排序,但是,它并不是那么具体,以至于我认为它需要自己的比较器。

Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like: Collections.Sort 让我得到了大约 95% 的方式,因为它的大部分自然排序,但是,对于像这样的字符串:

"-&4" and "%B", it will prioritize "%B" over "-&4". “-&4”和“%B”,它将优先于“%B”而不是“-&4”。

What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:我想要的是它按第一个字母数字字符排序,所以它会比较:

"4" and "B", putting: “4”和“B”,输入:

"-&4" first then "%B".首先是“-&4”,然后是“%B”。

Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).对特殊字符进行替换实际上无法正常工作,因为我必须保留字符串的完整性,并且我陷入了替换所有的兔子洞,排序以生成排序 position 然后尝试重新排序未替换列表无济于事(也似乎矫枉过正)。

I've spent the past 4 hours googling this and surprised it's such a novel situation.在过去的 4 个小时里,我在谷歌上搜索了这个,并惊讶于这是一个新奇的情况。 Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.大多数解决方案都带有对非字母数字字符的替换,但我需要保留原始字符串的完整性。

Apologies if this is confusing verbiage as well.道歉,如果这也令人困惑的措辞。

it's not so specific that I believe it needs its own comparator它不是那么具体,我认为它需要自己的比较器

If you don't supply a Comparator , the strings are sorted by their natural order .如果您不提供Comparator ,则字符串将按其自然顺序排序 Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.由于这不是您想要的,您肯定需要提供一个比较器,并且由于没有内置比较器完全符合您的要求,因此您确实需要提供一个自定义比较器。

The code below create a custom comparator using a helper method, and a lambda expression or a method reference.下面的代码使用辅助方法和 lambda 表达式或方法引用创建自定义比较器。 Just because you don't create your own class implementing Comparator , doesn't mean you're not creating your own comparator.仅仅因为您没有创建自己的 class 实施Comparator ,并不意味着您没有创建自己的比较器。


To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:要仅按字母数字字符排序,忽略空格和特殊字符,您可以这样做:

List<String> list = ...

Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));

If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.如果列表很大,您可能希望通过缓存排序使用的规范化字符串来提高性能。

List<String> list = ...

Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
        .collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));

Regex explained正则表达式解释

  • \p{L} matches all characters in Unicode category "Letter". \p{L}匹配Unicode 类别“字母”中的所有字符。
  • \p{N} matches all characters in Unicode category "Number". \p{N}匹配 Unicode 类别“数字”中的所有字符。
  • [^\p{L}\p{N}] matches all characters that are not "Letter" or "Number". [^\p{L}\p{N}]匹配所有不是“字母”或“数字”的字符。
  • "[^\\p{L}\\p{N}]+" is the Java encoded literal matching one or more of those characters. "[^\\p{L}\\p{N}]+"是与这些字符中的一个或多个匹配的 Java 编码的文字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM