[英]Modify the characters of words in a Java string with punctuation, but keep the positions of said punctuation?
例如,取以下String
列表,忽略引号:
"Hello"
"Hello!"
"I'm saying Hello!"
"I haven't said hello yet, but I will."
现在让我们说我想对每个单词的字符执行某个操作 - 例如,说我想要反转字符, 但保留标点符号的位置。 结果将是:
"olleH"
"olleH!"
"m'I gniyas olleH!"
"I tneva'h dias olleh tey, tub I lliw."
理想情况下,我希望我的代码独立于对字符串执行的操作 (另一个例子是随机抽样的字母),并且独立于所有标点 - 所以连字符,撇号,逗号,句号,en / em破折号,在执行操作之后, 所有这些都保持在其原始位置。 这可能需要某种形式的正则表达式。
为此,我想我应该保存给定单词中所有标点符号的索引和字符,执行操作,然后将所有标点符号重新插入到正确的位置。 但是,我想不出一种方法可以做到这一点,或者使用一个类。
我有第一次尝试,但不幸的是这不适用于标点符号,这是关键:
jshell> String str = "I haven't said hello yet, but I will."
str ==> "I haven't said hello yet, but I will."
jshell> Arrays.stream(str.split("\\s+")).map(x -> (new StringBuilder(x)).reverse().toString()).reduce((x, y) -> x + " " + y).get()
$2 ==> "I t'nevah dias olleh ,tey tub I .lliw"
有谁知道我怎么解决这个问题? 非常感谢。 不需要完整的工作代码 - 可能只是我可以用来执行此操作的适当类的路标。
不需要为此使用正则表达式,你当然不应该使用split("\\\\s+")
,因为你会丢失连续的空格,并且空白字符的类型,即结果的空格可能是不正确的。
您也不应该使用charAt()
或类似的东西,因为它不支持Unicode Supplemental Planes中的字母,即存储在Java字符串中作为代理对的Unicode字符。
基本逻辑:
作为Java代码,具有完整的Unicode支持:
public static String reverseLettersOfWords(String input) {
int[] codePoints = input.codePoints().toArray();
for (int i = 0, start = 0; i <= codePoints.length; i++) {
if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
for (int end = i - 1; ; start++, end--) {
while (start < end && ! Character.isLetter(codePoints[start]))
start++;
while (start < end && ! Character.isLetter(codePoints[end]))
end--;
if (start >= end)
break;
int tmp = codePoints[start];
codePoints[start] = codePoints[end];
codePoints[end] = tmp;
}
start = i + 1;
}
}
return new String(codePoints, 0, codePoints.length);
}
测试
System.out.println(reverseLettersOfWords("Hello"));
System.out.println(reverseLettersOfWords("Hello!"));
System.out.println(reverseLettersOfWords("I'm saying Hello!"));
System.out.println(reverseLettersOfWords("I haven't said hello yet, but I will."));
System.out.println(reverseLettersOfWords("Works with surrogate pairs: 𝓐𝓑𝓒+𝓓 "));
产量
olleH
olleH!
m'I gniyas olleH!
I tneva'h dias olleh tey, tub I lliw.
skroW htiw etagorrus sriap: 𝓓𝓒𝓑+𝓐
请注意,在端部的特殊字母所示的第一4 这里在列“脚本(或书法)”,“黑体”,例如𝓐是Unicode字符的数学BOLD SCRIPT CAPITAL A'(U + 1D4D0) ,其在Java中是两个字符"\?\?"
。
UPDATE
上述实现被优化用于反转单词的字母。 要应用任意操作来修改单词的字母,请使用以下实现:
public static String mangleLettersOfWords(String input) {
int[] codePoints = input.codePoints().toArray();
for (int i = 0, start = 0; i <= codePoints.length; i++) {
if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
int wordCodePointLen = 0;
for (int j = start; j < i; j++)
if (Character.isLetter(codePoints[j]))
wordCodePointLen++;
if (wordCodePointLen != 0) {
int[] wordCodePoints = new int[wordCodePointLen];
for (int j = start, k = 0; j < i; j++)
if (Character.isLetter(codePoints[j]))
wordCodePoints[k++] = codePoints[j];
int[] mangledCodePoints = mangleWord(wordCodePoints.clone());
if (mangledCodePoints.length != wordCodePointLen)
throw new IllegalStateException("Mangled word is wrong length: '" + new String(wordCodePoints, 0, wordCodePoints.length) + "' (" + wordCodePointLen + " code points)" +
" vs mangled '" + new String(mangledCodePoints, 0, mangledCodePoints.length) + "' (" + mangledCodePoints.length + " code points)");
for (int j = start, k = 0; j < i; j++)
if (Character.isLetter(codePoints[j]))
codePoints[j] = mangledCodePoints[k++];
}
start = i + 1;
}
}
return new String(codePoints, 0, codePoints.length);
}
private static int[] mangleWord(int[] codePoints) {
return mangleWord(new String(codePoints, 0, codePoints.length)).codePoints().toArray();
}
private static CharSequence mangleWord(String word) {
return new StringBuilder(word).reverse();
}
您当然可以通过调用传入的Function<int[], int[]>
或Function<String, ? extends CharSequence>
来替换对mangleWord
方法的硬编码调用Function<String, ? extends CharSequence>
如果需要, Function<String, ? extends CharSequence>
参数。
mangleWord
方法的实现结果与原始实现相同,但您现在可以轻松实现不同的修改算法。
例如,随机化字母,只需将codePoints
数组洗牌 :
private static int[] mangleWord(int[] codePoints) {
Random rnd = new Random();
for (int i = codePoints.length - 1; i > 0; i--) {
int j = rnd.nextInt(i + 1);
int tmp = codePoints[j];
codePoints[j] = codePoints[i];
codePoints[i] = tmp;
}
return codePoints;
}
样本输出
Hlelo
Hlleo!
m'I nsayig oHlel!
I athen'v siad eohll yte, btu I illw.
srWok twih rueoatrsg rpasi: 𝓑𝓒𝓐+𝓓
我怀疑有一个更有效的解决方案,但这是一个天真的:
public class Reverser {
public String reverseSentence(String sentence) {
String[] words = sentence.split(" ");
return Arrays.stream(words).map(this::reverseWord).collect(Collectors.joining(" "));
}
private String reverseWord(String word) {
String noPunctuation = word.replaceAll("\\W", "");
String reversed = new StringBuilder(noPunctuation).reverse().toString();
StringBuilder result = new StringBuilder();
for (int i = 0; i < word.length(); ++i) {
char ch = word.charAt(i);
if (!Character.isAlphabetic(ch) && !Character.isDigit(ch)) {
result.append(ch);
}
if (i < reversed.length()) {
result.append(reversed.charAt(i));
}
}
return result.toString();
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.