简体   繁体   中英

C# string manipulation to generate a list of strings

I am trying to do some string maniplation for a product import, unfortunely I have some duplicate data, which if left in would assign products to categories that I don't want products assigned to.

I have the following string :

Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2

The outcome I would like be:

Category A>Sub Category 1

Category B

Category C>Sub Category 2

First I split on the (|) which gives me:

Category A

Category A > Sub Category 1

Category B

Category C

Category C > Sub Category 2

I was then loop through this list and spilt on the (>)

But I don't know how to merge the results for example Category A\\ Sub Category 1

Below is the code. This will be used to process approx 1200 rows, so I am trying to make it has quick as possible.

    static void Main(string[] args)
    {
        string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";

        List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');

        List<List<string>> secondSplitResults = new List<List<string>>();

        foreach( string firstSplitResult in firstSplitResults )
        {
            List<string> d = firstSplitResult.SplitAndTrim('>');
            secondSplitResults.Add(d);
        }

       // PrintResults(firstSplitResults);
        PrintResults2(secondSplitResults);
    }

    public static void PrintResults(List<string> results)
    {
        foreach( string value in results)
        {
            Console.WriteLine(value);
        }
    }

    public static void PrintResults2(List<List<string>> results)
    {
        foreach(List<string> parent in results)
        {
            foreach (string value in parent)
            {
                Console.Write(value);
            }

            Console.WriteLine(".....");
        }


    }
}

public static class StringExtensions
{
    public static List<string> SplitAndTrim(this string value, char delimter)
    {
        if( string.IsNullOrWhiteSpace( value))
        {
            return null;
        }

        return value.Split(delimter).Select(i => i.Trim()).ToList();
    }
}

Once I have got the list correct I will rejoin the list with the (\\).

Any help would be very useful.

UPDATE

The data is coming from a CSV so it could have n number of levels.

So for example :

Category A -> THIS IS DATA IS REDUNDANT

Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT

Category A > Sub Category 1 > Sub Sub Category 1

Category A > Sub Category 1 > Sub Sub Category 2

Would result in :

Category A > Sub Category 1 > Sub Sub Category 1

Category A > Sub Category 1 > Sub Sub Category 2

Simon

You have a good start, basically you just need to add some code at the end to complete the solution.

foreach( List<string> i in secondSplitResults )
{
     if (i.Count == 2)
     {
        i.RemoveAll(x => x.Count == 1 && x[0] == i[0]);
        i.Insert(1,"/");
    }
}

PrintResults2(secondSplitResults);

If leaf elements you marked as "redundant" are removed the problem can be reduced to finding the longest path among items with common prefix:

class Program
{
    static void Main(string[] args)
    {
        string pathCase1 = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";
        string pathCase2 = "Category A -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2";
        PrintPaths("case1", ParsePaths(pathCase1));
        PrintPaths("case2", ParsePaths(pathCase2));

        Console.ReadLine();
    }

     private static void PrintPaths(string name, List<string> paths)
     {

         Console.WriteLine(name);
         Console.WriteLine();

         foreach (var item in paths)
         {
             Console.WriteLine(item);
         }

         Console.WriteLine();
     }



    static string NormalizePath(string src)
    {
        // Remove "-> THIS DATA IS REDUNDANT" elements

        int idx = src.LastIndexOf('>');
        if (idx > 0 && src[idx - 1] == '-')
        {
            src = src.Substring(0, idx - 1);
        }

        var parts = src.SplitAndTrim('>');
        return string.Join(">", parts);
    }


     static List<string> ParsePaths(string text)
     {
         var items = text.SplitAndTrim('|');
         for (int i = 0; i < items.Count; ++i)
         {
             items[i] = NormalizePath(items[i]);
         }

         items.Sort();

         var longestPaths = new SortedSet<string>();

         foreach (var s in items)
         {
             int idx = s.LastIndexOf('>');
             if (idx > 0)
             {
                 var prefix = s.Substring(0, idx);
                 longestPaths.Remove(prefix);
             }

             longestPaths.Add(s);
         }

         return longestPaths.ToList();
     }
}

Output:

case1

Category A>Sub Category 1
Category B
Category C>Sub Category 2

case2

Category A>Sub Category 1>Sub Sub Category 1
Category A>Sub Category 1>Sub Sub Category 2

I may have misunderstood the question, but maybe I did this in 2 lines of code:

https://dotnetfiddle.net/GyDwar

using System;
using System.Linq;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        foreach(var part in getParts("Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2"))
            Console.WriteLine(part);
        Console.WriteLine();

        Console.WriteLine("TEST 2");
        foreach(var part in getParts("Category A > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2"))
            Console.WriteLine(part);
    }

    public static List<string> getParts(string stringToParse){
        var parts = stringToParse.Split('|').Select(part => part.Trim());
        return parts.Where(part => !parts.Any(comparePart => part != comparePart && comparePart.StartsWith(part))).ToList();
    }
}

Result:

Category A > Sub Category 1
Category B
Category C > Sub Category 2

TEST 2
Category A > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > Sub Sub Category 1
Category A > Sub Category 1 > Sub Sub Category 2

I basically say take all the parts where it does not form the beginning of another part.

After you split on the (|) go through this list and simply calculate occurrences of each list item string within a initial string. If item occurrences within a initial string greater then 1 you should remove this item. Resulting list will be what you need. Calculation occurrences of each list item string within a initial string I took here How would you count occurrences of a string within a string? as far looks it's fastest approach

    string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";

    List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');

    for (int i = 0; i < firstSplitResults.Count; i++)
    {
        int occCount = (strProductCategories.Length - strProductCategories.Replace(firstSplitResults[i], "").Length) / firstSplitResults[i].Length;
        if (occCount > 1)
        {
            firstSplitResults.RemoveAt(i);
            i--;
        }
    }

    // print result
    for (int i = 0; i < firstSplitResults.Count; i++)
    {
        Console.WriteLine(firstSplitResults[i]);
    }
    Console.ReadLine();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM