简体   繁体   English

使用特殊情况获得前140个字符串的字符串

[英]Get first 140 characters of string with special case

I have one string and it has limited length of 140 characters. 我有一个字符串,它的长度限制为140个字符。 Usually, I get more than 140 in my code. 通常,我的代码中有超过140个。 String is set of values in this format: Mxxxx where x can be any number, and it does not have strict length. 字符串是以这种格式设置的值:Mxxxx其中x可以是任何数字,并且它没有严格的长度。 So I can have M1 or I can have M281 as well. 所以我可以拥有M1或者我也可以拥有M281。

If string is longer than 140 characters I want to take first 140, but if last one is broken on half, I don't want to have it in my string at all. 如果string超过140个字符,我想先取140,但如果最后一个打破了一半,我根本不想把它放在我的字符串中。

Still, I need to save second half in some local variable. 不过,我需要在一些局部变量中保存下半部分。

For example, lets say this is the string 例如,假设这是字符串

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

And lets say that this are first 140 characters: 让我们说这是前140个字符:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

The last value was M6919 but it was splitted to M69 and 19 . 最后一个值是M6919但它M6919 M6919

What is the most efficient way to say: Split if it's longer than 140, but if last value in new string was spitted on two remove it from first part of string and put it in other string value with the rest of the original string. 最有效的说法是:如果它超过140则拆分,但如果新字符串中的最后一个值被吐出,则将其从字符串的第一部分移除,并将其与原始字符串的其余部分一起放入其他字符串值中。

There is probably many ways to accomplish this. 可能有很多方法可以实现这一目标。 I could use if or switch/case loops and say if first letter of second string is not 'M', than I know that value was split and I should remove it from the first string, but does someone has cleaner solution than that? 我可以使用if或switch / case循环,并说如果第二个字符串的第一个字母不是'M',我知道该值被拆分,我应该从第一个字符串中删除它,但是有人有更清晰的解决方案吗?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");

    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

Something like this may work: 这样的事情可能有用:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

It simply takes the first 140 characters if the total length is greater. 如果总长度更大,它只需要前140个字符。 Then it searches for the last index of a comma and takes all characters until this comma. 然后它搜索逗号的最后一个索引并获取所有字符, 直到这个逗号。

Your best bet is to split your string into 'words', then reassemble them using a string builder. 最好的办法是将字符串拆分为“单词”,然后使用字符串生成器重新组合它们。 untested raw code will look like; 未经测试的原始代码看起来像;

public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

You need to split the words using something like this; 你需要使用这样的东西来分割单词;

 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

And you'd use the whole like this; 你会像这样使用整体;

 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

EDIT 编辑

The old-skool .net version looks like this; old-skool .net版本看起来像这样;

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}

 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

Something like this should work: 这样的事情应该有效:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";

if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);

Console.WriteLine(test);

My take, just for fun: 我的看法,只是为了好玩:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

Here I split the string in the Mxxx parts. 在这里,我将字符串拆分为Mxxx部分。 Then I iterate through those parts until the next part would overflow 140 (or 138, since it needs to include the ", " separators in the count) 然后我遍历这些部分,直到下一部分溢出140(或138,因为它需要在计数中包含", "分隔符)

See it in action 看到它在行动

If you don't want to split the string into lists, I would do something like the following: 如果您不想将字符串拆分为列表,我会执行以下操作:

string myString = "M19, M42........";
string result;
int index = 141;

do
{
    //Decrement index to reduce the substring size
    index--;

    //Make the result the new length substring
    result = myString.Substring(0, index);

}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

So you're basically just substringing your original string to 140, checking if the char at position 141 is a comma indicating a 'clean' cut. 因此,您基本上只是将原始字符串子串到140,检查位置141处的字符是否为逗号,表示“干净”剪切。 If not, it'll substring at 139, check 140 for a comma, etc. 如果没有,它将在139处子串,检查140是否有逗号等。

Here is a solution. 这是一个解决方案。 It process the string in backward direction from 141st character. 它从第141个字符开始向后处理字符串。

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;

    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;

    return input.Substring(0, i).TrimEnd(' ', ',');
}

Normalize(settlementsString, 140);

Probably not the most performance-sensitive solution due to the ongoing memory allocation for the new strings, it does sound however like a one-time raw data input of some kind. 由于新字符串的持续内存分配,可能不是性能最敏感的解决方案,但它确实听起来像某种类型的一次性原始数据输入。 We have the option to just remove "tokens" from the input while we have more then 140 chars: 我们可以选择从输入中删除“令牌”,而我们有超过140个字符:

const string separator = ", ";

while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;

     input = input.Remove(delStartIndex, delLength);
}

A more performance oriented way would be to create a form of IEnumerable<string> or string[] for the substrings, and count their total length before joining them. 更加注重性能的方法是为子string[]创建一个IEnumerable<string>string[]形式,并在加入它们之前计算它们的总长度。 Something along the lines of this: 有点像这样:

const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var length = splitInput[0].Length;
var targetIndex = 1;

for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;

if (length > 140)
    targetIndex--;

var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

var output = string.Join(separator, splitOutput);

We can even make a nice Extension Method like that: 我们甚至可以做一个很好的扩展方法:

public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

        var length = splitInput[0].Length;
        var targetIndex = 1;

        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;

        if (length > targetLength)
            targetIndex--;

        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

        return string.Join(separator, splitOutput);
    }
}

and call it like: 并称之为:

input.TrimToLength(", ", 140);

or: 要么:

input.TrimToLength(separator: ", ", targetLength:140);

I use this: 我用这个:

static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

working test sample code (with commented output): 工作测试示例代码(带注释输出):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");

            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

I hope this helps. 我希望这有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM