简体   繁体   English

Java字符串-在空间上拆分,但保留双倍空格

[英]Java string - split on space, but preserve double space

Currently I am splitting a string by spaces. 目前,我正在按空格分割字符串。 However there are some double spaces that I want to preserve when I put them all back together. 但是,当我将它们放回原处时,我想保留一些双空格。 Any suggestions on how to do this? 有关如何做到这一点的任何建议?

Ie the string "I went to the beach. I ate pie" is getting split as 即字符串"I went to the beach. I ate pie"被拆分为

I
went
to
the
beach.

I
ate
pie

I don't want the blank entries but I want to put it back together to the same format. 我不需要空白条目,但我想将其放回相同的格式。 Thanks all! 谢谢大家!

Do a String replaceAll(" ", " unlikelyCharacterSequence") and then split your string by spaces as normal. 做一个字符串replaceAll(“”,“可能性不大的字符序列”),然后像平常一样用空格分割您的字符串。 Then you can convert back to a double space by replacing your {unlikelyCharacterSequence} with " " at the end. 然后,您可以通过将{unlikelyCharacterSequence}末尾替换为“”来转换回双倍空格。

However: this will fail if you ever encounter your "unlikely" character sequence in your actual, unmodified String. 但是:如果您在未修改的实际String中遇到“不太可能”的字符序列,这将失败。 For a more general purpose solution, check the alternative listed below this example. 对于更通用的解决方案,请检查此示例下面列出的替代方法。

Example (warning, depends on non-existance of !@#!@# : 示例(警告,取决于!@#!@#的不存在:

String example = "Hello.  That was a double space. That was a single space."
String formatted = example.replace("  ", " !@#!@#");
String [] split = formatted.split(" ");
for(int i = 0; i < split.length; i++)
{
  split.replace("!@#!@#", " ");
}
// Recombine your splits?

Alternatively you could take a more robust strategy of recombining the string as you have it in your question but ignoring elements containing only a single space: 或者,您可以采用更健壮的策略来重新组合字符串(如您在问题中所看到的那样),但忽略仅包含单个空格的元素:

String example = "ThisShouldBeTwoElements.  ButItIsNot.";
String [] splitString = example.split(" ");
String recombined = "";
for(int i = 0; i < splitString.length; i++)
{
  if(!splitString[i].equals(" "))
    recombined += splitString[i];
}
String st = "I went to the beach.  I ate pie";
st.split("\\s{1}(?!\\s)");

This results in 这导致了

[I, went, to, the, beach. , I, ate, pie]

I also suggest looking at http://docs.oracle.com/javase/6/docs/api/ and/or http://www.regular-expressions.info/java.html so you understand what this is doing. 我还建议您查看http://docs.oracle.com/javase/6/docs/api/和/或http://www.regular-expressions.info/java.html,以便您了解这样做的意思。

Take a good look at what Java's Regex can do for you. 仔细看看Java的Regex能为您做什么。 There's a way to recongnize pattern using regex. 有一种使用正则表达式来确认模式的方法。

Java regex examples Java正则表达式示例

Try this, it should remove all white spaces that are between non white space characters. 尝试此操作,它应删除非空白字符之间的所有空白。

myString = myString.replaceAll("\S\s\S", "");

This will preserve white spaces when they occur more then once between two words. 当空格在两个单词之间出现的次数多于一次时,它将保留空白。

I know this is an old question, but for the benefit of future audiences: the concept you're looking for is "capturing groups" . 我知道这是一个古老的问题,但是为了将来的读者受益:您正在寻找的概念是“捕获群体” Capturing groups allow you to refer to matches in your expression and retrieve them later, such as via a back-reference, instead of the strings being swallowed. 捕获组允许您引用表达式中的匹配项,并在以后(例如,通过反向引用)检索它们,而不是吞下字符串。

From the docs, here's the relevant syntax you need to know: 在文档中,您需要了解以下相关语法:

(?<name>X)          X, as a named-capturing group
(?:X)               X, as a non-capturing group
(?idmsuxU-idmsuxU)  Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)               X, via zero-width positive lookahead
(?!X)               X, via zero-width negative lookahead
(?<=X)              X, via zero-width positive lookbehind
(?<!X)              X, via zero-width negative lookbehind
(?>X)               X, as an independent, non-capturing group

Using the input text: 使用输入文本:

String example = "ABC     DEF     GHI J K";

You can use a positive and negative lookahead combo to combine the trailing whitespace with each word: 您可以使用正向和负向超前组合将尾随空格与每个单词组合:

// Result: [ABC     , DEF     , GHI , J , K]
example.split("(?<=\\s+)(?!\\s)");

Or you can capture on word boundaries with positive lookahead to preserve the spaces as separate, grouped elements: 或者,您可以以正向前移捕获单词边界,以将空格保留为单独的分组元素:

// Result: [ABC,      , DEF,      , GHI,  , J,  , K]
example.split("(?=\\b)");

Java Pattern API: Java模式API:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html



Side Note: While the "replace the text with something completely implausible" suggestion is tempting because it's easy, don't ever do that in production code. 边注:虽然“替换一些完全令人难以置信的文字”的建议很诱人,因为它很容易, 永远不要做,在生产代码。 It will fail eventually, and it happens more often than you'd think. 它最终失败,并且发生的频率超出您的想象。 I debugged a call center after a programmer used about 80-columns of "~=$~=$~=$..." believing that was safe. 在程序员使用大约80列的“〜= $〜= $〜= $ ...”后,我调试了一个呼叫中心,认为这是安全的。 That lasted a couple months until a service rep saved a "fancy border" on his notes with just that sequence. 持续了几个月,直到服务代表按此顺序在他的笔记上保存了“花哨的边框”。 I've even witnessed a genuine, random MD5 collision on a search server. 我什至在搜索服务器上目睹了一次真正的随机MD5冲突。 Granted, the MD5 collision took 11 years, but it still crashed the search and the point remains. 诚然,MD5碰撞历时11年,但仍使搜索崩溃,并且问题依然存在。 Unique strings never are. 唯一的字符串永远不会。 Always assume that duplicates will appear. 始终假定将出现重复项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM