简体   繁体   English

用Java分割字符串和模式匹配

[英]Splitting strings & Pattern matching in Java

I have a following String: 我有以下字符串:

MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSD

I want to split such a string every time a K or R is encountered, except when followed by a P . 我想每次遇到KR都拆分这样的字符串,除非后面跟P

Therefore, I want the following output: 因此,我需要以下输出:

MYLMFILLAAGCSK
MYLLFINNAARPFASSTK
AASTVVTPHHSYTSKPHHSTTSHCK
SSD

At first, I tried using simple .split() function in java but I couldn't get the desired result. 最初,我尝试在Java中使用简单的.split()函数,但无法获得所需的结果。 Because I really don't know how to mention it in the .split() function not to split if there is a P right after K or R . 因为我真的不知道如何在.split()函数中提及它,如果在KR后面有一个P话,不进行拆分。

I've looked at other similar questions and they suggest to use Pattern matching but I don't know how to use it in this context. 我看过其他类似的问题,他们建议使用模式匹配,但是我不知道如何在这种情况下使用它。

You can use split: 您可以使用split:

String[] parts = str.split("(?<=[KR])(?!P)");

Because you want to keep the input you're splitting on, you must use a look behind , which asserts without consuming. 因为您想保留正在分割的输入,所以必须使用look后 ,它断言而不会消耗。 There are two look arounds: 有两种环顾四周:

  • (?<=[KR]) means "the previous char is either K or R " (?<=[KR])表示“上一个字符为KR
  • (?!P) means "the next char is not a P " (?!P)表示“下一个字符不是 P

This regex matches between characters where you want to split. 此正则表达式在您要分割的字符之间匹配。


Some test code: 一些测试代码:

String str = "MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSD";
Arrays.stream(str.split("(?<=[KR])(?!P)")).forEach(System.out::println);

Output: 输出:

MYLMFILLAAGCSK
MYLLFINNAARPFASSTK
AASTVVTPHHSYTSKPHHSTTSHCK
SSD

Just try this regexp: 只需尝试以下正则表达式:

(K)([^P]|$)

and substitute each matching by 并将每个匹配项替换为

\1\n\2

as ilustrated in the following demo . 如以下演示中所示 No negative lookahead needed. 无需负面的前瞻。 But you cannot use it with split, as it should eliminate the not P character after the K also. 但是您不能将其与split一起使用,因为它也应该在K之后消除not P字符。

You can do a first transform like the one above, and then .split("\\n"); 您可以先执行上述转换,然后执行.split("\\n"); so it should be: 所以应该是:

"MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSDK"
    .subst("(K)([^P]|$)", "\1\n\2").split("\n");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM