简体   繁体   中英

java regex match words that aren't numbers

I have a json missing quotes

{
    data: [{
        timestamp: 1467720920,
        val: {
            min: 6.90,
            max: 7.25,
            avg: 7.22
        },
        temp: {
            min: 75.49,
            max: 75.49,
            avg: 75.49
        },
        gps: {
            lat: 0.707581,
            long: -1.941864,
            hdop: 2.54,
            ttf: 49.4
        }
    }],
    id: A1000049A6248C,
    groupId: HU5PPC1E,
    rssi: -93,
    cell: {
        timestamp: 1467731669,
        rssi: -93,
        lat: 0.735554,
        long: -1.974655
    }
}
}

I need to put quotes around all of the words to the left of the colon and all of the words that aren't purely numbers to the right of the colon. So I need quotes around A1000049A6248C but not -1.974655. How do I make a regex to do this in java? I've tried

json.replaceAll("(\\\\w+|[+-]([0-9]*[.])?[0-9]+)", "\\"$1\\"");

which will put every word in quotes. I've also tried something like this to get a word that isn't all numbers json.replaceAll("\\\\b(?!\\\\d*)\\\\b", "\\"$1\\"");

Expected format

{
  "data": [
    {
      "timestamp": 1463494202,
      "val": {
        "min": 6.75,
        "max": 7.19,
        "avg": 7.14
      },
      "temp_int": {
        "min": 54.28,
        "max": 54.28,
        "avg": 54.28
      },
      "gps": {
        "lat": 0.711407,
        "long": -1.460091,
        "hdop": 1.42,
        "ttf": 42
      }
    }
  ],
  "id": "A1000049A624D1",
  "groupId": "299F7G5AR",
  "rssi": -83,
  "cell": {
    "timestamp": 1463501353,
    "rssi": -83,
    "lat": 0,
    "long": 0
  }
}

You should use negative lookahead for 'not a number'

((?![-+]?[0-9]*\\.?[0-9])\\w+\\b)

with \\"$0\\" replacement

Edit: JimmyJames solution is probably faster but still need negative lookahead to handle null and boolean values - to handle whole json.

\b(?!null|true|false)(\w|\.)*([a-z]|[A-Z])+(\w|\.)\b

You can try this lookahead regex:

str = str.replaceAll("[\\w-]+(?=\\s*:)", "\"$0\"")
         .replceAll("(?<=:)\\s*(?!-?\\d+(?:\\.\\d+)?\\s*(?:,|\\r?\\n))([\\w-]+)", "\"$1\"");

RegEx Demo

(?!-?\\\\d+(?:\\\\.\\\\d+)?\\\\s*(?:,|\\\\r?\\\\n)) is the negative lookahead to assert that we're not matching a negative/positive decimal/integer number.

Assuming a word is a continuous sequence of word (or period) characters with at least one letter, Wouldn't it me more efficient to do something like this for your match?

(\w|\.)*([a-z]|[A-Z])+(\w|\.)

As opposed to finding all words and then excluding the numbers?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM