简体   繁体   English

使用Java中的正则表达式拆分输入

[英]Splitting up input using regular expressions in Java

I am making a program that lets a user input a chemical for example C9H11N02. 我正在制作一个程序,允许用户输入化学药品,例如C9H11N02。 When they enter that I want to split it up into pieces so I can have it like C9, H11, N, 02. When I have it like this I want to make changes to it so I can make it C10H12N203 and then put it back together. 当他们进入时,我想将其切成小块,以便可以像C9,H11,N,02一样得到。当我有这样的东西时,我想对其进行更改,以便可以将其制成C10H12N203,然后放回去一起。 This is what I have done so far. 到目前为止,这是我所做的。 using the regular expression I have used I can extract the integer value, but how would I go about get C10, H11 etc..? 使用我使用的正则表达式,我可以提取整数值,但是我将如何获取C10,H11等。

System.out.println("Enter Data");

Scanner k = new Scanner( System.in );
String input = k.nextLine();

String reg = "\\s\\s\\s";
String [] data;

data = input.split( reg );

int m = Integer.parseInt( data[0] );
int n = Integer.parseInt( data[1] );

It can be done using look arounds : 可以使用环顾四周

String[] parts = input.split("(?<=.)(?=[A-Z])");

Look arounds are zero-width, non-consuming assertions. 环顾四周是零宽度的非消耗性断言。

This regex splits the input where the two look arounds match: 此正则表达式在两个环视匹配的地方分割输入:

  • (?<=.) means "there is a preceding character" (ie not at the start of input) (?<=.)的意思是“ 前一字符”(即未在输入的开始)
  • (?=[AZ]) means "the next character is a capital letter" (All elements start with AZ ) (?=[AZ])表示“下一个字符为大写字母”(所有元素AZ开头)

Here's a test, including a double-character symbol for some edge cases: 这是一个测试,其中包括一些边缘情况的双字符符号:

public static void main(String[] args) {
    String input = "C9KrBr2H11NO2";
    String[] parts = input.split("(?<=.)(?=[A-Z])");
    System.out.println(Arrays.toString(parts));
}

Output: 输出:

[C9, Kr, Br2, H11, N, O2]

If you then wanted to split up the individual components, use a nested call to split() : 如果然后要拆分各个组件,请使用嵌套调用split()

public static void main(String[] args) {
    String input = "C9KrBr2H11NO2";
    for (String component : input.split("(?<=.)(?=[A-Z])")) {
        // split on non-digit/digit boundary
        String[] symbolAndNumber = component.split("(?<!\\d)(?=\\d)");
        String element = symbolAndNumber[0];
        // elements without numbers won't be split
        String count = symbolAndNumber.length == 1 ? "1" : symbolAndNumber[1];
        System.out.println(element + " x " + count);
    }
}

Output: 输出:

C x 9
Kr x 1
Br x 2
H x 11
N x 1
O x 2

Did you accidentally put zeroes into some of those formula where the letter "O" (oxygen) was supposed to be? 您是否在某些公式中意外地将了零(字母“ O”(氧气)应该是)? If so: 如果是这样的话:

"C10H12N2O3".split("(?<=[0-9A-Za-z])(?=[A-Z])");

[C10, H12, N2, O3]

"CH2BrCl".split("(?<=[0-9A-Za-z])(?=[A-Z])");

[C, H2, Br, Cl]

I believe the following code should allow you to extract the various elements and their associated count. 我相信以下代码应允许您提取各种元素及其关联的计数。 Of course, brackets make things more complicated, but you didn't ask about them! 当然,方括号会使事情变得更复杂,但是您没有询问它们!

Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
    String element = matcher.group(1);
    int count = 1;
    if (matcher.groupCount > 1) {
        try {
            count = Integer.parseInt(matcher.group(2));
        } catch (NumberFormatException e) {
            // Regex means we should never get here!
        }
    }
    // Do stuff with this component
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM