简体   繁体   中英

Regex in Java - parsing a string array

I have a string array like this:

    String tweetString = ExudeData.getInstance().filterStoppingsKeepDuplicates(tweets.text);
    // get array of words and split
    String[] wordArray = tweetString.split(" ");

After I split the array, I print the following:

System.out.println(Arrays.toString(wordArray));

And the output I get is:

[new, single, fallin, dropping, days, artwork, hueshq, production, iseedaviddrums, amp, bigearl7, mix, reallygoldsmith, https, , , t, co, dk5xl4cicm, https, , , t, co, rvqkum0dk7]

What I want is to remove all the instances of commas, https, and single letters like 't' (after using split method above). So I want to end up with this:

[new, single, fallin, dropping, days, artwork, hueshq, production, iseedaviddrums, amp, bigearl7, mix, reallygoldsmith, co, dk5xl4cicm, https, co, rvqkum0dk7]

I've tried doing replaceAll like this:

String sanitizedString = wordArray.replaceAll("\\s+", " ").replaceAll(",+", ",");

But that just gave me the same initial output with no changes. Any ideas?

If you are using Java 8

String[] result = Arrays.stream(tweetString.split("\\s+"))
            .filter(s -> !s.isEmpty())
            .toArray(String[]::new);

What I want is to remove all the instances of commas, https, and single letters like 't'

In this case you can make multiple filters like @Andronicus do or with matches and some regex like so :

String[] result = Arrays.stream(tweetString.split("\\s+"))
            .filter(s -> !s.matches("https|.|\\s+"))
            .toArray(String[]::new);

You can do something like this:

String[] filtered = Arrays
    .stream(tweetString.split("[ ,]"))
    .filter(str -> str.length() > 1)
    .filter(str -> !str.equals("http"))

Based on my comment here is quick solution. (Enhance the regex with all your keywords)

 private static void replaceFromRegex(final String text ) {
    String result = text.replaceAll("https($|\\s)| (?<!\\S)[^ ](?!\\S)","");
      System.out.println(result);
  }

and then test

  public static void main(String []args) throws Exception{
      replaceFromRegex("new single fallin dropping, , https");
     }

Note: This is just sample and you will have to enhance regex to consider starting word (eg string starting with https and then space, etc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM