简体   繁体   English

Java - 按编号和字母拆分字符串

[英]Java - Split String by Number and Letters

So I have, for example, a string such as this C3H20IO 所以我有一个像这个C3H20IO这样的字符串

What I wanna do is split this string so I get the following: 我想做的是分割这个字符串,所以我得到以下内容:

Array1 = {C,H,I,O}
Array2 = {3,20,1,1}

The 1 as the third element of the Array2 is indicative of the monoatomic nature of the I element. 作为Array2的第三个元素的1表示I元素的单原子性质。 Same for O . O相同。 That is actually the part I am struggling with. 这实际上是我正在努力的部分。

This is a chemical equation, so I need to separate the elements according to their names and the amount of atoms there are etc. 这是一个化学方程,所以我需要根据它们的名称和原子数等来分离元素。

You could try this approach: 你可以尝试这种方法:

String formula = "C3H20IO";

//insert "1" in atom-atom boundry 
formula = formula.replaceAll("(?<=[A-Z])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\D)$", "1");

//split at letter-digit or digit-letter boundry
String regex = "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)";
String[] atoms = formula.split(regex);

Output: 输出:

atoms: [C, 3, H, 20, I, 1, O, 1] 原子:[C,3,H,20,I,1,O,1]

Now all even even indices (0, 2, 4...) are atoms and odd ones are the associated number: 现在所有偶数索引(0,2,4 ......)都是原子而奇数是相关的数字:

String[] a = new String[ atoms.length/2 ];
int[] n = new int[ atoms.length/2 ];

for(int i = 0 ; i < a.length ; i++) {
    a[i] = atoms[i*2];
    n[i] = Integer.parseInt(atoms[i*2+1]);
}

Output: 输出:

a: [C, H, I, O] a:[C,H,I,O]
n: [3, 20, 1, 1] n:[3,20,1,1]

You can use a regular expression to slide over your input using the Matcher.find() method. 您可以使用正则表达式使用Matcher.find()方法在输入上滑动。

Here a rough example of what it may look like: 这里有一个粗略的例子:

    String input = "C3H20IO";

    List<String> array1 = new ArrayList<String>();
    List<Integer> array2 = new ArrayList<Integer>();

    Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
    Matcher matcher = pattern.matcher(input);               
    while(matcher.find()){
        array1.add(matcher.group(1));

        String atomAmount = matcher.group(2);
        int atomAmountInt = 1;
        if((atomAmount != null) && (!atomAmount.isEmpty())){
            atomAmountInt = Integer.valueOf(atomAmount);
        }
        array2.add(atomAmountInt);
    }

I know, the conversion from List to Array is missing, but it should give you an idea of how to approach your problem. 我知道,缺少从List到Array的转换,但它应该让您了解如何解决您的问题。

An approach without REGEX and data stored using ArrayList : 没有REGEX的方法和使用ArrayList存储的数据:

String s = "C3H20IO";

char Chem = '-';
String val = "";
boolean isFisrt = true;
List<Character> chemList = new ArrayList<Character>();
List<Integer> weightList = new ArrayList<Integer>();
for (char c : s.toCharArray()) {
    if (Character.isLetter(c)) {
        if (!isFisrt) {
            chemList.add(Chem);
            weightList.add(Integer.valueOf(val.equals("") ? "1" : val));
            val = "";
        }
        Chem = c;
    } else if (Character.isDigit(c)) {
        val += c;
    } 
    isFisrt = false;
}
chemList.add(Chem);
weightList.add(Integer.valueOf(val.equals("") ? "1" : val));

System.out.println(chemList);
System.out.println(weightList);

OUTPUT: OUTPUT:

[C, H, I, O]
[3, 20, 1, 1]

This works assuming each element starts with a capital letter, ie if you have "Fe" you don't represent it in String as "FE". 这可以假设每个元素以大写字母开头,即如果你有“Fe”,你不会在字符串中将它表示为“FE”。 Basically, you split the string on each capital letter then split each new string by letters and numbers, adding "1" if the new split contains no numbers. 基本上,您在每个大写字母上拆分字符串,然后按字母和数字拆分每个新字符串,如果新拆分不包含数字,则添加“1”。

        String s = "C3H20IO";
        List<String> letters = new ArrayList<>();
        List<String> numbers = new ArrayList<>();

        String[] arr = s.split("(?=\\p{Upper})");  // [C3, H20, I, O]
        for (String str : arr) {  //[C, 3]:[H, 20]:[I]:[O]
            String[] temp = str.split("(?=\\d)", 2);
            letters.add(temp[0]);
            if (temp.length == 1) {
                numbers.add("1");
            } else {
                numbers.add(temp[1]);
            }
        }
        System.out.println(Arrays.asList(letters)); //[[C, H, I, O]]
        System.out.println(Arrays.asList(numbers)); //[[3, 20, 1, 1]]

make (for loop) with size of input length and add following condition 使用输入长度的大小make(for循环)并添加以下条件

if(i==number)
// add it to the number array

if(i==character)
//add it into character array

I suggest splitting by uppercase letter using zero-width lookahead regex (to extract items like C12 , O2 , Si ), then split each item into element and its numeric weight: 我建议使用零宽度前瞻性正则表达式分割大写字母(以提取像C12O2Si这样的项目),然后将每个项目拆分为元素及其数字权重:

List<String> elements = new ArrayList<>();
List<Integer> weights = new ArrayList<>();

String[] items = "C6H12Si6OH".split("(?=[A-Z])");  // [C6, H12, Si6, O, H]
for (String item : items) {
    String[] pair = item.split("(?=[0-9])", 2);    // e.g. H12 => [H, 12], O => [O]
    elements.add(pair[0]);
    weights.add(pair.length > 1 ? Integer.parseInt(pair[1]) : 1);
}
System.out.println(elements);  // [C, H, Si, O, H]
System.out.println(weights);   // [6, 12, 6, 1, 1]

Is this good? 这个好吗? (Not using split ) (不使用split

Regex Demo 正则表达式演示

String line = "C3H20ZnO2ABCD";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+)";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);

while (m.find( )) {
     System.out.print(m.group(1));
     if (m.group(2).length() == 0) {
         System.out.println(" 1");
     } else {
         System.out.println(" " + m.group(2));
     }
  }

IDEONE DEMO IDEONE DEMO

You can use two patterns : 您可以使用两种模式:

  • [0-9] [0-9]
  • [a-zA-Z] [A-ZA-Z]

Split twice by each of them. 每次拆分两次。

List<String> letters = Arrays.asList(test.split("[0-9]"));
List<String> numbers = Arrays.asList(test.split("[a-zA-Z]"))
            .stream()
            .filter(s -> !s.equals(""))
            .collect(Collectors.toList());

if(letters.size() != numbers.size()){
        numbers.add("1");
    }

You can split the string by using a regular expression like (?<=\\D)(?=\\d). 您可以使用正则表达式(?<= \\ D)(?= \\ d)拆分字符串。 Try this : 试试这个 :

String alphanum= "abcd1234";
String[] part = alphanum.split("(?<=\\D)(?=\\d)");
System.out.println(part[0]);
System.out.println(part[1]);

will output 将输出

abcd 1234 abcd 1234

I did this as following 我这样做了如下

ArrayList<Integer> integerCharacters = new ArrayList();
ArrayList<String> stringCharacters = new ArrayList<>();

String value = "C3H20IO"; //Your value 
String[] strSplitted = value.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"); //Split numeric and strings

for(int i=0; i<strSplitted.length; i++){

    if (Character.isLetter(strSplitted[i].charAt(0))){
        stringCharacters.add(strSplitted[i]); //If string then add to strings array
    }
    else{
        integerCharacters.add(Integer.parseInt(strSplitted[i])); //else add to integer array
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM