How to get the last Unicode text element efficiently

Question

How to get the last Unicode text element of a string without iterating the entire string? There are two ways offered by System.Globalization.StringInfo, but I suspect them to enumerate the entire string:

    [TestMethod]
    [TestCategory("Verification")]
    public void GetLastTextElement_TextEndsWithSurrogatePair_GetsSurrogatePair()
    {
        // Arrange
        const string OsmanyaDigitOne = "\U000104A1";
        const string OsmanyaDigitTwo = "\U000104A2";
        const string Target = "abc" + OsmanyaDigitOne + "de" + OsmanyaDigitTwo;

        // Act
        int length = Target.Length;
        string lastSubstring = Target.Substring(length - 1);

        StringInfo stringInfo = new StringInfo(Target);
        int lengthInTextElements = stringInfo.LengthInTextElements;
        string lastTextElement = stringInfo.SubstringByTextElements(lengthInTextElements - 1);

        string lastTextElementInOneExpression = Target.Substring(StringInfo.ParseCombiningCharacters(Target).Last());

        // Assert
        Assert.AreEqual(9, length, @"Wrong length");
        Assert.AreNotEqual(OsmanyaDigitTwo, lastSubstring, @"Unexpectedly got last text element");
        Assert.AreEqual(7, lengthInTextElements, @"Wrong length in text elements");
        Assert.AreEqual(OsmanyaDigitTwo, lastTextElement, @"Wrong last text element");
        Assert.AreEqual(OsmanyaDigitTwo, lastTextElementInOneExpression, @"Wrong last text element");
    }

Answer 1

last Unicode text element efficiently

If you simply mean the last unicode codepoint then it is quite easy:

string unicode = Target.Length >= 2 && char.IsLowSurrogate(Target, Target.Length - 1) && char.IsHighSurrogate(Target, Target.Length - 2) 
    ? Target.Substring(Target.Length - 2, 2) 
    : Target.Substring(Target.Length - 1, 1).ToString();

If you mean the last grapheme (so the last codepoint together with for example the combining marks that could follow it, so potentially multiple codepoints, like e + ◌̃ ) it is more complex.

How to get the last Unicode text element efficiently

Question

1 answers

solution1
0 ACCPTED 2017-03-30 10:17:46

How to get the last Unicode text element efficiently

Question

1 answers

solution1 0 ACCPTED 2017-03-30 10:17:46

solution1
0 ACCPTED 2017-03-30 10:17:46