I am trying to do some string maniplation for a product import, unfortunely I have some duplicate data, which if left in would assign products to categories that I don't want products assigned to.
I have the following string :
Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2
The outcome I would like be:
Category A>Sub Category 1
Category B
Category C>Sub Category 2
First I split on the (|) which gives me:
Category A
Category A > Sub Category 1
Category B
Category C
Category C > Sub Category 2
I was then loop through this list and spilt on the (>)
But I don't know how to merge the results for example Category A\\ Sub Category 1
Below is the code. This will be used to process approx 1200 rows, so I am trying to make it has quick as possible.
static void Main(string[] args)
{
string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";
List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');
List<List<string>> secondSplitResults = new List<List<string>>();
foreach( string firstSplitResult in firstSplitResults )
{
List<string> d = firstSplitResult.SplitAndTrim('>');
secondSplitResults.Add(d);
}
// PrintResults(firstSplitResults);
PrintResults2(secondSplitResults);
}
public static void PrintResults(List<string> results)
{
foreach( string value in results)
{
Console.WriteLine(value);
}
}
public static void PrintResults2(List<List<string>> results)
{
foreach(List<string> parent in results)
{
foreach (string value in parent)
{
Console.Write(value);
}
Console.WriteLine(".....");
}
}
}
public static class StringExtensions
{
public static List<string> SplitAndTrim(this string value, char delimter)
{
if( string.IsNullOrWhiteSpace( value))
{
return null;
}
return value.Split(delimter).Select(i => i.Trim()).ToList();
}
}
Once I have got the list correct I will rejoin the list with the (\\).
Any help would be very useful.
UPDATE
The data is coming from a CSV so it could have n number of levels.
So for example :
Category A -> THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > Sub Sub Category 1
Category A > Sub Category 1 > Sub Sub Category 2
Would result in :
Category A > Sub Category 1 > Sub Sub Category 1
Category A > Sub Category 1 > Sub Sub Category 2
Simon
You have a good start, basically you just need to add some code at the end to complete the solution.
foreach( List<string> i in secondSplitResults )
{
if (i.Count == 2)
{
i.RemoveAll(x => x.Count == 1 && x[0] == i[0]);
i.Insert(1,"/");
}
}
PrintResults2(secondSplitResults);
If leaf elements you marked as "redundant" are removed the problem can be reduced to finding the longest path among items with common prefix:
class Program
{
static void Main(string[] args)
{
string pathCase1 = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";
string pathCase2 = "Category A -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 -> THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2";
PrintPaths("case1", ParsePaths(pathCase1));
PrintPaths("case2", ParsePaths(pathCase2));
Console.ReadLine();
}
private static void PrintPaths(string name, List<string> paths)
{
Console.WriteLine(name);
Console.WriteLine();
foreach (var item in paths)
{
Console.WriteLine(item);
}
Console.WriteLine();
}
static string NormalizePath(string src)
{
// Remove "-> THIS DATA IS REDUNDANT" elements
int idx = src.LastIndexOf('>');
if (idx > 0 && src[idx - 1] == '-')
{
src = src.Substring(0, idx - 1);
}
var parts = src.SplitAndTrim('>');
return string.Join(">", parts);
}
static List<string> ParsePaths(string text)
{
var items = text.SplitAndTrim('|');
for (int i = 0; i < items.Count; ++i)
{
items[i] = NormalizePath(items[i]);
}
items.Sort();
var longestPaths = new SortedSet<string>();
foreach (var s in items)
{
int idx = s.LastIndexOf('>');
if (idx > 0)
{
var prefix = s.Substring(0, idx);
longestPaths.Remove(prefix);
}
longestPaths.Add(s);
}
return longestPaths.ToList();
}
}
Output:
case1
Category A>Sub Category 1
Category B
Category C>Sub Category 2
case2
Category A>Sub Category 1>Sub Sub Category 1
Category A>Sub Category 1>Sub Sub Category 2
I may have misunderstood the question, but maybe I did this in 2 lines of code:
https://dotnetfiddle.net/GyDwar
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
foreach(var part in getParts("Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2"))
Console.WriteLine(part);
Console.WriteLine();
Console.WriteLine("TEST 2");
foreach(var part in getParts("Category A > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT|Category A > Sub Category 1 > Sub Sub Category 1|Category A > Sub Category 1 > Sub Sub Category 2"))
Console.WriteLine(part);
}
public static List<string> getParts(string stringToParse){
var parts = stringToParse.Split('|').Select(part => part.Trim());
return parts.Where(part => !parts.Any(comparePart => part != comparePart && comparePart.StartsWith(part))).ToList();
}
}
Result:
Category A > Sub Category 1
Category B
Category C > Sub Category 2
TEST 2
Category A > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > THIS IS DATA IS REDUNDANT
Category A > Sub Category 1 > Sub Sub Category 1
Category A > Sub Category 1 > Sub Sub Category 2
I basically say take all the parts where it does not form the beginning of another part.
After you split on the (|) go through this list and simply calculate occurrences of each list item string within a initial string. If item occurrences within a initial string greater then 1 you should remove this item. Resulting list will be what you need. Calculation occurrences of each list item string within a initial string I took here How would you count occurrences of a string within a string? as far looks it's fastest approach
string strProductCategories = "Category A|Category A > Sub Category 1|Category B|Category C|Category C > Sub Category 2";
List<string> firstSplitResults = strProductCategories.SplitAndTrim('|');
for (int i = 0; i < firstSplitResults.Count; i++)
{
int occCount = (strProductCategories.Length - strProductCategories.Replace(firstSplitResults[i], "").Length) / firstSplitResults[i].Length;
if (occCount > 1)
{
firstSplitResults.RemoveAt(i);
i--;
}
}
// print result
for (int i = 0; i < firstSplitResults.Count; i++)
{
Console.WriteLine(firstSplitResults[i]);
}
Console.ReadLine();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.