简体   繁体   English

使用已知模式从字符串中解析值的 sscanf 的 Java 等价物是什么?

[英]what is the Java equivalent of sscanf for parsing values from a string using a known pattern?

So I come from a C background (originally originally, though I haven't used that language for almost 5 years) and I'm trying to parse some values from a string in Java.所以我来自 C 背景(最初最初,尽管我已经将近 5 年没有使用该语言)并且我正在尝试从 Java 中的字符串解析一些值。 In CI would use sscanf.在 CI 中会使用 sscanf。 In Java people have told me "use Scanner, or StringTokenizer", but I can't see how to use them to achieve my purpose.在 Java 中,人们告诉我“使用 Scanner 或 StringTokenizer”,但我看不到如何使用它们来实现我的目的。

My input string looks like "17-MAR-11 15.52.25.000000000".我的输入字符串看起来像“17-MAR-11 15.52.25.000000000”。 In CI would do something like:在 CI 中会做类似的事情:

sscanf(thestring, "%d-%s-%d %d.%d.%d.%d", day, month, year, hour, min, sec, fracpart);

But in Java, all I can do is things like:但是在 Java 中,我所能做的就是:

scanner.nextInt();

This doesn't allow me to check the pattern, and for "MAR" I end up having to do things like:这不允许我检查模式,对于“MAR”,我最终不得不执行以下操作:

str.substring(3,6);

Horrible!可怕! Surely there is a better way?当然有更好的方法吗?

The problem is Java hasn't out parameters (or passing by reference) as C or C#.问题是 Java 没有像 C 或 C# 那样输出参数(或通过引用传递)。

But there is a better way (and more solid).但是有更好的方法(并且更可靠)。 Use regular expressions:使用正则表达式:

Pattern p = Pattern.compile("(\\d+)-(\\p{Alpha}+)-(\\d+) (\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)")
Matcher m = p.matcher("17-MAR-11 15.52.25.000000000");
day = m.group(1);
month= m.group(2);
....

Of course C code is more concise, but this technique has one profit: Patterns specifies format more precise than '%s' and '%d'.当然 C 代码更简洁,但这种技术有一个好处:模式指定的格式比 '%s' 和 '%d' 更精确。 So you can use \\d{2} to specify that day MUST be compose of exactly 2 digits.因此,您可以使用 \\d{2} 来指定该日期必须正好由 2 位数字组成。

Here is a solution using scanners:这是使用扫描仪的解决方案:

Scanner scanner = new Scanner("17-MAR-11 15.52.25.000000000");

Scanner dayScanner = new Scanner(scanner.next());
Scanner timeScanner = new Scanner(scanner.next());

dayScanner.useDelimiter("-");
System.out.println("day=" + dayScanner.nextInt());
System.out.println("month=" + dayScanner.next());
System.out.println("year=" + dayScanner.nextInt());

timeScanner.useDelimiter("\\.");
System.out.println("hour=" + timeScanner.nextInt());
System.out.println("min=" + timeScanner.nextInt());
System.out.println("sec=" + timeScanner.nextInt());
System.out.println("fracpart=" + timeScanner.nextInt());

None of these examples were really satisfactory to me so I made my own java sscanf utility:这些例子都没有让我真正满意,所以我制作了自己的 java sscanf 实用程序:

https://github.com/driedler/java-sscanf/tree/master/src/util/sscanf https://github.com/driedler/java-sscanf/tree/master/src/util/sscanf

Here's an example of parsing a hex string:这是解析十六进制字符串的示例:

String buffer = "my hex string: DEADBEEF\n"
Object output[] = Sscanf.scan(buffer, "my hex string: %X\n", 1);

System.out.println("parse count: " + output.length);
System.out.println("hex str1: " + (Long)output[0]);

// Output:
// parse count: 1
// hex str1: 3735928559

For "17-MAR-11 15.52.25.000000000":对于“17-MAR-11 15.52.25.000000000”:

SimpleDateFormat format = new SimpleDateFormat("dd-MMM-yy HH.mm.ss.SSS");

try 
{
    Date parsed = format.parse(dateString);
    System.out.println(parsed.toString());
}
catch (ParseException pe)
{
    System.out.println("ERROR: Cannot parse \"" + dateString + "\"");
}

This is far from as elegant solution as one would get with using regex, but ought to work.这远不是使用正则表达式获得的优雅解决方案,但应该可以工作。

public static void stringStuffThing(){
String x = "17-MAR-11 15.52.25.000000000";
String y[] = x.split(" ");

for(String s : y){
    System.out.println(s);
}
String date[] = y[0].split("-");
String values[] = y[1].split("\\.");

for(String s : date){
    System.out.println(s);
}
for(String s : values){
    System.out.println(s);
}

2019 answer: Java's Scanner is flexible for reading a wide range of formats. 2019 答案:Java 的扫描仪可以灵活地读取多种格式。 But if your format has simple {%d, %f, %s} fields then you can scan easily with this small class (~90 lines):但是,如果您的格式具有简单的 {%d, %f, %s} 字段,那么您可以使用这个小类(约 90 行)轻松扫描:

import java.util.ArrayList;

/**
 * Basic C-style string formatting and scanning.
 * The format strings can contain %d, %f and %s codes.
 * @author Adam Gawne-Cain
 */
public class CFormat {
    private static boolean accept(char t, char c, int i) {
        if (t == 'd')
            return "0123456789".indexOf(c) >= 0 || i == 0 && c == '-';
        else if (t == 'f')
            return "-0123456789.+Ee".indexOf(c) >= 0;
        else if (t == 's')
            return Character.isLetterOrDigit(c);
        throw new RuntimeException("Unknown format code: " + t);
    }

    /**
     * Returns string formatted like C, or throws exception if anything wrong.
     * @param fmt format specification
     * @param args values to format
     * @return string formatted like C.
     */
    public static String printf(String fmt, Object... args) {
        int a = 0;
        StringBuilder sb = new StringBuilder();
        int n = fmt.length();
        for (int i = 0; i < n; i++) {
            char c = fmt.charAt(i);
            if (c == '%') {
                char t = fmt.charAt(++i);
                if (t == 'd')
                    sb.append(((Number) args[a++]).intValue());
                else if (t == 'f')
                    sb.append(((Number) args[a++]).doubleValue());
                else if (t == 's')
                    sb.append(args[a++]);
                else if (t == '%')
                    sb.append(t);
                else
                    throw new RuntimeException("Unknown format code: " + t);
            } else
                sb.append(c);
        }
        return sb.toString();
    }

    /**
     * Returns scanned values, or throws exception if anything wrong.
     * @param fmt format specification
     * @param str string to scan
     * @return scanned values
     */
    public static Object[] scanf(String fmt, String str) {
        ArrayList ans = new ArrayList();
        int s = 0;
        int ns = str.length();
        int n = fmt.length();
        for (int i = 0; i < n; i++) {
            char c = fmt.charAt(i);
            if (c == '%') {
                char t = fmt.charAt(++i);
                if (t=='%')
                    c=t;
                else {
                    int s0 = s;
                    while ((s == s0 || s < ns) && accept(t, str.charAt(s), s - s0))
                        s++;
                    String sub = str.substring(s0, s);
                    if (t == 'd')
                        ans.add(Integer.parseInt(sub));
                    else if (t == 'f')
                        ans.add(Double.parseDouble(sub));
                    else
                        ans.add(sub);
                    continue;
                }
            }
            if (str.charAt(s++) != c)
                throw new RuntimeException();
        }
        if (s < ns)
            throw new RuntimeException("Unmatched characters at end of string");
        return ans.toArray();
    }
}

For example, the OP's case can be handled like this:例如,OP的情况可以这样处理:

    // Example of "CFormat.scanf"
    String str = "17-MAR-11 15.52.25.000000000";
    Object[] a = CFormat.scanf("%d-%s-%d %d.%d.%f", str);

    // Pick out scanned fields
    int day = (Integer) a[0];
    String month = (String) a[1];
    int year = (Integer) a[2];
    int hour = (Integer) a[3];
    int min = (Integer) a[4];
    double sec = (Double) a[5];

    // Example of "CFormat.printf"  
    System.out.println(CFormat.printf("Got day=%d month=%s hour=%d min=%d sec=%f\n", day, month, year, hour, min, sec));

Are you familiar with the concept of regular expressions?你熟悉正则表达式的概念吗? Java provides you with the ability to use regex by using the Pattern class. Java 通过使用 Pattern 类为您提供使用正则表达式的能力。 Check this one out: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html看看这个: http : //docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You can test your String like that:你可以像这样测试你的字符串:

Matcher matcher = Pattern.match(yourString);
matcher.find();

and then use the methods provided by Matcher to manipulate the string you found or NOT.然后使用 Matcher 提供的方法来操作您找到或未找到的字符串。

System.in.read() 是另一种选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM