繁体   English   中英

使用特殊情况获得前140个字符串的字符串

[英]Get first 140 characters of string with special case

我有一个字符串,它的长度限制为140个字符。 通常,我的代码中有超过140个。 字符串是以这种格式设置的值:Mxxxx其中x可以是任何数字,并且它没有严格的长度。 所以我可以拥有M1或者我也可以拥有M281。

如果string超过140个字符,我想先取140,但如果最后一个打破了一半,我根本不想把它放在我的字符串中。

不过,我需要在一些局部变量中保存下半部分。

例如,假设这是字符串

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

让我们说这是前140个字符:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

最后一个值是M6919但它M6919 M6919

最有效的说法是:如果它超过140则拆分,但如果新字符串中的最后一个值被吐出,则将其从字符串的第一部分移除,并将其与原始字符串的其余部分一起放入其他字符串值中。

可能有很多方法可以实现这一目标。 我可以使用if或switch / case循环,并说如果第二个字符串的第一个字母不是'M',我知道该值被拆分,我应该从第一个字符串中删除它,但是有人有更清晰的解决方案吗?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");

    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

这样的事情可能有用:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

如果总长度更大,它只需要前140个字符。 然后它搜索逗号的最后一个索引并获取所有字符, 直到这个逗号。

最好的办法是将字符串拆分为“单词”,然后使用字符串生成器重新组合它们。 未经测试的原始代码看起来像;

public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

你需要使用这样的东西来分割单词;

 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

你会像这样使用整体;

 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

编辑

old-skool .net版本看起来像这样;

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}

 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

这样的事情应该有效:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";

if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);

Console.WriteLine(test);

我的看法,只是为了好玩:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

在这里,我将字符串拆分为Mxxx部分。 然后我遍历这些部分,直到下一部分溢出140(或138,因为它需要在计数中包含", "分隔符)

看到它在行动

如果您不想将字符串拆分为列表,我会执行以下操作:

string myString = "M19, M42........";
string result;
int index = 141;

do
{
    //Decrement index to reduce the substring size
    index--;

    //Make the result the new length substring
    result = myString.Substring(0, index);

}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

因此,您基本上只是将原始字符串子串到140,检查位置141处的字符是否为逗号,表示“干净”剪切。 如果没有,它将在139处子串,检查140是否有逗号等。

这是一个解决方案。 它从第141个字符开始向后处理字符串。

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;

    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;

    return input.Substring(0, i).TrimEnd(' ', ',');
}

Normalize(settlementsString, 140);

由于新字符串的持续内存分配,可能不是性能最敏感的解决方案,但它确实听起来像某种类型的一次性原始数据输入。 我们可以选择从输入中删除“令牌”,而我们有超过140个字符:

const string separator = ", ";

while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;

     input = input.Remove(delStartIndex, delLength);
}

更加注重性能的方法是为子string[]创建一个IEnumerable<string>string[]形式,并在加入它们之前计算它们的总长度。 有点像这样:

const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var length = splitInput[0].Length;
var targetIndex = 1;

for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;

if (length > 140)
    targetIndex--;

var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

var output = string.Join(separator, splitOutput);

我们甚至可以做一个很好的扩展方法:

public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

        var length = splitInput[0].Length;
        var targetIndex = 1;

        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;

        if (length > targetLength)
            targetIndex--;

        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

        return string.Join(separator, splitOutput);
    }
}

并称之为:

input.TrimToLength(", ", 140);

要么:

input.TrimToLength(separator: ", ", targetLength:140);

我用这个:

static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

工作测试示例代码(带注释输出):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");

            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

我希望这有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM