简体   繁体   中英

Java split String into Array

I have a crawler that extracts Data from a website and i am getting the following String:

    String s = "                 --                 Android 2.3.1 (Gingerbread) --                --                  --                 --                   --                  --                  --                 --                 8" Wide LCD - tela sensível ao toque (resistiva) --                 --                 800 x 600 (4:3) --                --                  --                 --                   --                  --                  --                 --                 1,2 GHz ARM Cortex A8 Core (RK2918) --               --                 4 GB (Memória Flash) e DRAM 512 Mb, DDR3 --                  --                 Slot para cartão Micro SD (Máx. 32 GB) --                --                 Integrado, suporta rotação de tela --                --                 Sim --               --                 Sim --               --                 Suporte a multi idioma: Português, Inglês, Francês, Espanhol, Chinês --                  --                 Navegador para Internet, vídeo, foto e áudio players,e-mail, calculadora, gravador de áudio, suporte a e-book, etc. --               --                   --                  --                  --                 --                 802.11 b/g/n (até 300 Mbps) --               --                 2.1 --               --                 USB 2.0 e Mini USB --                --                   --                  --                  --                 --                 14,65 x 21,50 x 1,45 --                  --                 525g --                   --                  --                 --                 Recarregável, Litium (4700 mAh, 3,7 V) --";

I neet to split that String into one Array but discart the empty ones, so i did this:

String sr[] = s.split(" -- ");
List<String> list = new ArrayList<String>(Arrays.asList(sr));
list.removeAll(Arrays.asList("", null));

But i keep getting the following result

[               ,               Android 2.3.1 (Gingerbread),                 ,               ,              ,                ,               ,               ,              ,               8&quot; Wide LCD - tela sensível ao toque (resistiva),              ,               800 x 600 (4:3),                 ,               ,              ,                ,               ,               ,              ,               1,2 GHz ARM Cortex A8 Core (RK2918),                ,               4 GB (Memória Flash) e DRAM 512 Mb, DDR3,               ,               Slot para cartão Micro SD (Máx. 32 GB),                 ,               Integrado, suporta rotação de tela,                 ,               Sim,                ,               Sim,                ,               Suporte a multi idioma: Português, Inglês, Francês, Espanhol, Chinês,               ,               Navegador para Internet, vídeo, foto e áudio players,e-mail, calculadora, gravador de áudio, suporte a e-book, etc.,                ,                ,               ,               ,              ,               802.11 b/g/n (até 300 Mbps),                ,               2.1,                ,               USB 2.0 e Mini USB,                 ,                ,               ,               ,              ,               14,65 x 21,50 x 1,45,               ,               525g,                ,               ,              ,               Recarregável, Litium (4700 mAh, 3,7 V) --]

I want in the array only stuff thats not empty My guess is thats because the Strings arent really empty and i am getting some HTML blank stuff that i cant get rid.

After doing a s.split("\\\\s+(--\\\\s+)+"); The array is still keeping the empty stuff:

[, Android 2.3.1 (Gingerbread),  ,  ,  ,  ,  , 8&quot; Wide LCD - tela sensível ao toque (resistiva), 800 x 600 (4:3),  ,  ,  ,  ,  , 1,2 GHz ARM Cortex A8 Core (RK2918), 4 GB (Memória Flash) e DRAM 512 Mb, DDR3, Slot para cartão Micro SD (Máx. 32 GB), Integrado, suporta rotação de tela, Sim, Sim, Suporte a multi idioma: Português, Inglês, Francês, Espanhol, Chinês, Navegador para Internet, vídeo, foto e áudio players,e-mail, calculadora, gravador de áudio, suporte a e-book, etc.,  ,  ,  , 802.11 b/g/n (até 300 Mbps), 2.1, USB 2.0 e Mini USB,  ,  ,  , 14,65 x 21,50 x 1,45, 525g,  ,  , Recarregável, Litium (4700 mAh, 3,7 V) --]

You can try this:

String sr[] = s.split("\\s+--\\s+");

Putting "\\\\s+" will take in an arbitrary number of spaces, instead of just " " which is just one space (if you want just the space character to be taken into account, replace \\\\s with a litteral space character). If you want to avoid all emply elements in the array, try:

String sr[] = s.split("\\s+(--\\s+)+");

Having (--\\\\s+)+ means that even if the pattern is repeated, it removes them all.

I think what you are looking for is String.replace() :

String sentence = "Hello World   !";
String str = sentence.replace(" ", "");

System.out.println(str);

Output:

HelloWorld!

您可以在数组中的字符串上调用String#trim() ,这将删除所有空白。

To remove all the empty strings and those that only contain whitespace from the list:

Iterator<String> it = list.iterator();
while (it.hasNext()) {
    String s = it.next();
    if (s.matches("^\\s*$")) {
        it.remove();
    }
}

Try this:

    String sr[] = s.split("--");
    List<String> list = new ArrayList<String>(Arrays.asList(sr));
    ArrayList<String> removeList = new ArrayList<String>();
    String curr;
    for (int i=0; i < list.size(); i++) {
        curr = list.get(i).trim();
        list.set(i, curr);
        if (curr.length() == 0)
            removeList.add(curr);
    }
    list.removeAll(removeList);
    System.out.println(list);
ArrayList<String> result = new ArrayList<String>();
String entries[] = s.split("--");
for(String entry:entries){
  String noSpace = entry.replaceAll(" ","");
  if(!noSpace.isEmpty()){
    result.add(noSpace);
  }
}
return result;

The String gets split by "--", then each element is added to the result, except when it only contains whitespaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM