简体   繁体   English

字符串在第二个空格字符后拆分

[英]String Split After Second Space Character

I have a string that I download from JSOUP that looks like 我有一个从JSOUP下载的字符串,看起来像

Paul Millsap Al Horford Tiago Splitter Jeff Teague Kyle Korver Thabo Sefolosha Mike Scott Shelvin Mack Kent Bazemore Dennis Schröder Tim Hardaway Jr. Walter Tavares Justin Holiday Mike Muscala Lamar Patterson Terran Petteway 保罗·米尔萨普(Paul Millsap Al Al Horford)蒂亚戈(Teago)分离器杰夫·蒂格(Jeff Teague)凯尔·科沃尔(Korver)塔博·塞弗洛莎(Sefolosha)迈克·斯科特·谢尔文·马克·肯特·巴兹莫尔(DennisSchröder)

I want to split it into an array for use in a list view, so the desired output would be: 我想将其拆分为数组以用于列表视图,因此所需的输出将是:

Paul Millsap, Al Horford, Tiago Splitter, Jeff Teague, Kyle Korver, Thabo Sefolosha, Mike Scott, Shelvin Mack, Kent Bazemore, Dennis Schröder, Tim Hardaway Jr., Walter Tavares, Justin Holiday, Mike Muscala, Lamar Patterson, Terran Petteway, Paul Millsap,Al Horford,Tiago Splitter,Jeff Teague,Kyle Korver,Thabo Sefolosha,Mike Scott,Shelvin Mack,Kent Bazemore,DennisSchröder,Tim Hardaway Jr.,Walter Tavares,Justin Holiday,Mike Muscala,Lamar Patterson,Terran Petteway,

How can I do this? 我怎样才能做到这一点? Thanks for any help. 谢谢你的帮助。

You could do a basic "split every second space", then examine the next string to see if it has anything (like a period) that would indicate that it belongs to the previous string. 您可以做一个基本的“每隔第二个空格”,然后检查下一个字符串,看是否有任何内容(如句点)表明它属于前一个字符串。 Works if things like Jr. have the period, wouldn't work if the punctuation isn't present 如果像Jr.这样的东西有句号,则起作用;如果不存在标点符号,则不会起作用

Preferred answer: 首选答案:

Since you are parsing page which have nice table and you want to get values from specific columns (names of players which are also links) you can do it easily with: 由于您要解析的表中有漂亮的表格,并且您想从特定列(也就是链接的播放器名称)中获取值,因此您可以轻松地做到这一点:

String url = "http://www.spotrac.com/nba/atlanta-hawks/cap/";
Document doc = Jsoup.connect(url).get();
Elements players = doc.select("table.datatable td.player a");
for (Element player : players){
    System.out.println(player.text());
}

which will: 这将:

  • find table tag with class datatable then inside 找到带有类datatable table标签,然后在里面
  • then from that table we select td.player which represents each cell td element with class player 然后从该表中选择td.player ,它代表带有class player每个单元格td元素
  • finally we want to pick these cells which have links a (since names are links) 最后,我们要选择具有链接a这些单元格(因为名称是链接)

Original answer: 原始答案:

Based only on example data from your question, you could try to find OneWord[space]SecondWord(optional:[space]Jr.) . 仅根据问题中的示例数据,您可以尝试查找OneWord[space]SecondWord(optional:[space]Jr.)

Code based on this idea could look like: 基于此想法的代码如下所示:

String input = " Paul Millsap Al Horford Tiago Splitter Jeff Teague Kyle Korver Thabo Sefolosha Mike Scott Shelvin Mack Kent Bazemore Dennis Schröder Tim Hardaway Jr. Walter Tavares Justin Holiday Mike Muscala Lamar Patterson Terran Petteway";
Pattern p = Pattern.compile("\\w+\\s+\\w+(\\s+Jr[.])?",
        Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(input);
while (m.find()) {
    System.out.println(m.group());
}

Output: 输出:

Paul Millsap
Al Horford
Tiago Splitter
Jeff Teague
Kyle Korver
Thabo Sefolosha
Mike Scott
Shelvin Mack
Kent Bazemore
Dennis Schröder
Tim Hardaway Jr.
Walter Tavares
Justin Holiday
Mike Muscala
Lamar Patterson
Terran Petteway

Search for two words, and then search for any third word only if it ends in a . 搜索两个单词,然后搜索仅第三个单词结尾的单词. character: 字符:

\b(\w+ \w+\b(?: \w+\.)?)

Replace with \\1, . 替换为\\1, regex101.com example regex101.com示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM