简体   繁体   English

Java中的字符串标记生成器

[英]string tokenizer in Java

I have a text file which contains data seperated by '|'. 我有一个文本文件,其中包含由'|'分隔的数据。 I need to get each field(seperated by '|') and process it. 我需要得到每个字段(用'|'分隔)并处理它。 The text file can be shown as below : 文本文件如下所示:

ABC|DEF||FGHT ABC | DEF || FGHT

I am using string tokenizer(JDK 1.4) for getting each field value. 我使用字符串标记器(JDK 1.4)来获取每个字段值。 Now the problem is, I should get an empty string after DEF.However, I am not getting the empty space between DEF & FGHT. 现在问题是,我应该在DEF之后得到一个空字符串。但是,我没有得到DEF和FGHT之间的空白区域。

My result should be - ABC,DEF,"",FGHT but I am getting ABC,DEF,FGHT 我的结果应该是 - ABC,DEF,“”,FGHT,但我得到ABC,DEF,FGHT

From StringTokenizer documentation : StringTokenizer文档:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. StringTokenizer是一个遗留类,出于兼容性原因而保留,尽管在新代码中不鼓励使用它。 It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead. 建议任何寻求此功能的人都使用String的split方法或java.util.regex包。

The following code should work : 以下代码应该有效:

String s = "ABC|DEF||FGHT";
String[] r = s.split("\\|");

Use the returnDelims flag and check two subsequent occurrences of the delimiter: 使用returnDelims标志并检查以后出现的两个分隔符:

String str = "ABC|DEF||FGHT";
String delim = "|";
StringTokenizer tok = new StringTokenizer(str, delim, true);

boolean expectDelim = false;
while (tok.hasMoreTokens()) {
    String token = tok.nextToken();
    if (delim.equals(token)) {
        if (expectDelim) {
            expectDelim = false;
            continue;
        } else {
            // unexpected delim means empty token
            token = null;
        }
    }

    System.out.println(token);
    expectDelim = true;
}

this prints 这打印

ABC
DEF
null
FGHT

The API isn't pretty and therefore considered legacy (ie "almost obsolete"). API不漂亮,因此被认为是遗留的(即“几乎过时”)。 Use it only with where pattern matching is too expensive (which should only be the case for extremely long strings) or where an API expects an Enumeration. 仅在模式匹配过于昂贵的情况下使用它(对于极长字符串应该只是这种情况)或者API期望枚举。

In case you switch to String.split(String) , make sure to quote the delimiter. 如果切换到String.split(String) ,请确保引用分隔符。 Either manually ( "\\\\|" ) or automatically using string.split(Pattern.quote(delim)); 手动( "\\\\|" )或自动使用string.split(Pattern.quote(delim));

StringTokenizer ignores empty elements. StringTokenizer忽略空元素。 Consider using String.split, which is also available in 1.4. 考虑使用String.split,它也可以在1.4中使用。

From the javadocs: 来自javadocs:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. StringTokenizer是一个遗留类,出于兼容性原因而保留,尽管在新代码中不鼓励使用它。 It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead. 建议任何寻求此功能的人都使用String的split方法或java.util.regex包。

you can use the constructor that takes an extra 'returnDelims' boolean, and pass true to it. 你可以使用带有额外'returnDelims'布尔值的构造函数,并将true传递给它。 this way you will receive the delimiters, which will allow you to detect this condition. 通过这种方式,您将收到分隔符,这将允许您检测此情况。

alternatively you can just implement your own string tokenizer that does what you need, it's not that hard. 或者你可以实现自己的字符串标记器,它可以满足您的需要,并不难。

Here is another way to solve this problem 这是解决这个问题的另一种方法

   String str =  "ABC|DEF||FGHT";
   StringTokenizer s = new StringTokenizer(str,"|",true);
   String currentToken="",previousToken="";


   while(s.hasMoreTokens())
   {
    //Get the current token from the tokenize strings
     currentToken = s.nextToken();

    //Check for the empty token in between ||
     if(currentToken.equals("|") && previousToken.equals("|"))
     {
        //We denote the empty token so we print null on the screen
        System.out.println("null");
     }

     else
     {
        //We only print the tokens except delimiters
        if(!currentToken.equals("|"))
        System.out.println(currentToken);
     }

     previousToken = currentToken;
   }
package com.java.String;

import java.util.StringTokenizer;

public class StringWordReverse {

    public static void main(String[] kam) {
        String s;
        String sReversed = "";
        System.out.println("Enter a string to reverse");
        s = "THIS IS ASHIK SKLAB";
        StringTokenizer st = new StringTokenizer(s);


        while (st.hasMoreTokens()) {
            sReversed = st.nextToken() + " " + sReversed;
        }

        System.out.println("Original string is : " + s);
        System.out.println("Reversed string is : " + sReversed);

    }
}

Output: 输出:

Enter a string to reverse 输入要反转的字符串

Original string is : THIS IS ASHIK SKLAB 原始字符串是:这是ASHIK SKLAB

Reversed string is : SKLAB ASHIK IS THIS 反向字符串是:SKLAB ASHIK就是这个

Here is a way to split a string into tokens (a token is one or more letters) 这是一种将字符串拆分为标记的方法(标记是一个或多个字母)

public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    String s = scan.nextLine();
    s = s.replaceAll("[^A-Za-z]", " ");
    StringTokenizer arr = new StringTokenizer(s, " ");
    int n = arr.countTokens();
    System.out.println(n);
    while(arr.hasMoreTokens()){
        System.out.println(arr.nextToken());
    }
    scan.close();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM