简体   繁体   中英

How do i parse a string to get specific information using java?

Here are some lines from a file and I'm not sure how to parse it to extract 4 pieces of information.

11::American President, The (1995)::Comedy|Drama|Romance
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
13::Balto (1995)::Animation|Children's
14::Nixon (1995)::Drama

I would like to get the number, title, release date and genre. Genre has multiple genres so I would like to save each one in a variable as well.

I'm using the .split("::|\\\\|"); method to parse it but I'm not able to parse out the release date.

Can anyone help me!

The easiest would be matching by regex, something like this

  String x = "11::Title (2016)::Category";
  Pattern p = Pattern.compile("^([0-9]+)::([a-zA-Z ]+)\\(([0-9]{4})\\)::([a-zA-Z]+)$");
  Matcher m = p.matcher(x);
  if (m.find()) {
    System.out.println("Number: " + m.group(1) + " Title: " + m.group(2) + " Year: " + m.group(3) + " Categories: " + m.group(4));
  }

(please don't nail me on the exact syntax, just out of my head)

Then first capture will be the number, the second will be the name, the third is the year and the fourth is the set of categories, which you may then split by '|'.

You may need to adjust the valid characters for title and categories, but you should get the idea.

If you have multiple lines, split them into an ArrayList first and treat each one separately in a loop.

Try this

String[] s =  {
    "11::American President, The (1995)::Comedy|Drama|Romance",
    "12::Dracula: Dead and Loving It (1995)::Comedy|Horror",
    "13::Balto (1995)::Animation|Children's",
    "14::Nixon (1995)::Drama",
};
for (String e : s) {
    String[] infos = e.split("::|\\s*\\(|\\)::");
    String number = infos[0];
    String title = infos[1];
    String releaseDate = infos[2];
    String[] genres = infos[3].split("\\|");
    System.out.printf("number=%s title=%s releaseDate=%s genres=%s%n",
          number, title, releaseDate, Arrays.toString(genres));
}

output

number=11 title=American President, The releaseDate=1995 genres=[Comedy, Drama, Romance]
number=12 title=Dracula: Dead and Loving It releaseDate=1995 genres=[Comedy, Horror]
number=13 title=Balto releaseDate=1995 genres=[Animation, Children's]
number=14 title=Nixon releaseDate=1995 genres=[Drama]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM