简体   繁体   中英

Get first 140 characters of string with special case

I have one string and it has limited length of 140 characters. Usually, I get more than 140 in my code. String is set of values in this format: Mxxxx where x can be any number, and it does not have strict length. So I can have M1 or I can have M281 as well.

If string is longer than 140 characters I want to take first 140, but if last one is broken on half, I don't want to have it in my string at all.

Still, I need to save second half in some local variable.

For example, lets say this is the string

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

And lets say that this are first 140 characters:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

The last value was M6919 but it was splitted to M69 and 19 .

What is the most efficient way to say: Split if it's longer than 140, but if last value in new string was spitted on two remove it from first part of string and put it in other string value with the rest of the original string.

There is probably many ways to accomplish this. I could use if or switch/case loops and say if first letter of second string is not 'M', than I know that value was split and I should remove it from the first string, but does someone has cleaner solution than that?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");

    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

Something like this may work:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

It simply takes the first 140 characters if the total length is greater. Then it searches for the last index of a comma and takes all characters until this comma.

Your best bet is to split your string into 'words', then reassemble them using a string builder. untested raw code will look like;

public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

You need to split the words using something like this;

 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

And you'd use the whole like this;

 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

EDIT

The old-skool .net version looks like this;

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}

 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

Something like this should work:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";

if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);

Console.WriteLine(test);

My take, just for fun:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

Here I split the string in the Mxxx parts. Then I iterate through those parts until the next part would overflow 140 (or 138, since it needs to include the ", " separators in the count)

See it in action

If you don't want to split the string into lists, I would do something like the following:

string myString = "M19, M42........";
string result;
int index = 141;

do
{
    //Decrement index to reduce the substring size
    index--;

    //Make the result the new length substring
    result = myString.Substring(0, index);

}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

So you're basically just substringing your original string to 140, checking if the char at position 141 is a comma indicating a 'clean' cut. If not, it'll substring at 139, check 140 for a comma, etc.

Here is a solution. It process the string in backward direction from 141st character.

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;

    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;

    return input.Substring(0, i).TrimEnd(' ', ',');
}

Normalize(settlementsString, 140);

Probably not the most performance-sensitive solution due to the ongoing memory allocation for the new strings, it does sound however like a one-time raw data input of some kind. We have the option to just remove "tokens" from the input while we have more then 140 chars:

const string separator = ", ";

while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;

     input = input.Remove(delStartIndex, delLength);
}

A more performance oriented way would be to create a form of IEnumerable<string> or string[] for the substrings, and count their total length before joining them. Something along the lines of this:

const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var length = splitInput[0].Length;
var targetIndex = 1;

for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;

if (length > 140)
    targetIndex--;

var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

var output = string.Join(separator, splitOutput);

We can even make a nice Extension Method like that:

public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

        var length = splitInput[0].Length;
        var targetIndex = 1;

        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;

        if (length > targetLength)
            targetIndex--;

        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

        return string.Join(separator, splitOutput);
    }
}

and call it like:

input.TrimToLength(", ", 140);

or:

input.TrimToLength(separator: ", ", targetLength:140);

I use this:

static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

working test sample code (with commented output):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");

            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM