简体   繁体   中英

Clean fastest & efficient way to parse a string in c#

I have to create a string parser in C#. string needs to be parsed in parent-child relation, string is like:

Water, Bulgur Wheat (29%), Sweetened Dried Cranberries (5%) (Sugar, Cranberries), Sunflower Seeds (3%), Onion (3%), Green Lentils (2%), Palm Oil, Flavourings (contain Barley), Lemon Juice Powder (<2%) (Maltodextrin, Lemon Juice Concentrate), Ground Spices (<2%) (Paprika, Black Pepper, Cinnamon, Coriander, Cumin, Chilli Powder, Cardamom, Pimento, Ginger), Dried Herbs (<2%) (Coriander, Parsley, Mint), Dried Garlic (<2%), Salt, Maltodextrin, Onion Powder (<2%), Cumin Seeds, Dried Lemon Peel (<2%), Acid (Citric Acid)

I know I could go char by char and eventually find my way through it, but what's the easiest way to get this information.

Expected Output:-

在此处输入图片说明

public static string ParseString(string input)
{
    StringBuilder sb = new StringBuilder();
    bool skipNext = false; // used to skip spaces after commas
    foreach (char c in input)
    {
        if (!skipNext)
        {
            switch (c)
            {
                case '(':
                    sb.Append("\n\t");
                    break;
                case ',':
                    sb.Append("\n");
                    skipNext = true;
                    break;
                case ')':
                    sb.Append("\n");
                    break;
                default:
                    sb.Append(c);
                    break;
            }
        }
        else
        {
            skipNext = false;
        }
    }

    return sb.ToString();
}

This should get you started. It does not handle parenthesis that do not denote children.

After looking at the data posted (Water, Bulgur…), one issue will be distinguishing/separating each individual item: 1 Water, 2 Bulgar.., 3 Sweetened.

Splitting on commas “,” will not work as there are commas inside some parenthesis “()” as (Sugar, Cranberries). These items (Sugar, Cranberries) are SUB items to Sweetened Dried Cranberries... so splitting the string on commas won't work.

From your given data, I would consider changing its format to accommodate this situation. A simple change would be to change the comma delimiter between the sub groups to something else… A dash “-“ may work.

The Regex code below does just this. This basically changes each comma “,” that is between an open and close parenthesis “()” to a dash “-“. This will allow a split on commas to identify each item.

private static string ReplaceCommaBetweenParens(string inString) {
  string pattern = @"(?<=\([^\)]*)+,(?!\()(?=[^\(]*\))";
  return Regex.Replace(inString, pattern, "-");
}

The above code is not pretty and I got this code from somewhere else and wish I could site the original author. I welcome all Regex aficionados to critique the pattern. I am not sure how you would do this using regular string method(s) (split/indexof) to accomplish this. I am confident it would take several steps. A good example of how useful Regex can be in some situations. It may be ugly but it works crazy fast. Fortunately the above cryptic code (Regex) is not going to help much after this step.

Once this change has been made it is a fairly straight forward process to indent your output as required. The code below reads each row from a DataTable . Each row may have 1 or more items separated my commas “,”. The code loops through each row parsing the items in the string. I made a simple class to hold the items; however the code is peppered with the correct output if a class is not needed. Hope this helps.

Simple Class to hold the individual items

class Ingredient {

  int ID { get; set; }
  string Name { get; set; }
  string Percent { get; set; }
  List<string> Ingredients { get; set; }

  public Ingredient(int id, string name, string pct, List<string> ingredients) {
    ID = id;
    Name = name;
    Percent = pct;
    Ingredients = ingredients;
  }

  public override string ToString() {
    StringBuilder sb = new StringBuilder();
    sb.Append(ID + "\t" + Name + " " + Percent + Environment.NewLine);
    foreach (string s in Ingredients) {
      sb.Append("\t\t" + s + Environment.NewLine);
    }
    return sb.ToString();
  }
}

Code to use the above class

static string ingredients = "Water, Bulgur Wheat(29%), Sweetened Dried Cranberries(5%) (Sugar, Cranberries)," +
                              " Sunflower Seeds(3%), Onion(3%), Green Lentils(2%), Palm Oil, Flavourings (contain Barley)," +
                              " Lemon Juice Powder(<2%) (Maltodextrin, Lemon Juice Concentrate)," + 
                              " Ground Spices(<2%) (Paprika, Black Pepper, Cinnamon, Coriander, Cumin, Chilli Powder, Cardamom, Pimento, Ginger)," + 
                              " Dried Herbs(<2%) (Coriander, Parsley, Mint), Dried Garlic(<2%), Salt, Maltodextrin, Onion Powder(<2%)," + 
                              " Cumin Seeds, Dried Lemon Peel(<2%), Acid(Citric Acid)";

static List<Ingredient> allIngredients;

static void Main(string[] args) {
  allIngredients = ParseString(ingredients);
  foreach (Ingredient curIngredient in allIngredients) {
    Console.Write(curIngredient.ToString());
  }
  Console.ReadLine();
}

private static List<Ingredient> ParseString(string inString) {
  List<Ingredient> allIngredients = new List<Ingredient>();
  string temp = ReplaceCommaBetweenParens(ingredients);
  string[] allItems = temp.Split(',');
  int count = 1;
  foreach (string curItem in allItems) {
    if (curItem.Contains("(")) {
      allIngredients.Add(ParseItem(curItem, count));
    }
    else {
      allIngredients.Add(new Ingredient(count, curItem.Trim(), "", new List<string>()));
      //Console.WriteLine(count + "\t" + curItem.Trim());
    }
    count++;
  }
  return allIngredients;
}

private static Ingredient ParseItem(string item, int count) {
  string pct = "";
  List<string> items = new List<string>();
  int firstParenIndex = item.IndexOf("(");
  //Console.Write(count + "\t" + item.Substring(0, firstParenIndex).Trim());

  Regex expression = new Regex(@"\((.*?)\)");
  MatchCollection matches = expression.Matches(item);
  bool percentPresent = true;
  foreach (Match match in matches) {
    if (match.ToString().Contains("%")) {  // <-- if the string between parenthesis does not contain "%" - move to next line, otherwise print on same line
      //Console.WriteLine(" " + match.ToString().Trim());
      pct = match.ToString().Trim();
      percentPresent = false;
    }
    else {
      if (percentPresent) {
        //Console.WriteLine();
       }
      items = GetLastItems(match.ToString().Trim());
    }
  }
  return new Ingredient(count, item.Substring(0, firstParenIndex).Trim(), pct, items);
}

private static List<string> GetLastItems(string inString) {
  List<string> result = new List<string>();
  string temp = inString.Replace("(", "");
  temp = temp.Replace(")", "");
  string[] allItems = temp.Split('-');
  foreach (string curItem in allItems) {
    //Console.WriteLine("\t\t" + curItem.Trim());
    result.Add(curItem.Trim());
  }
  return result;
}

private static string ReplaceCommaBetweenParens(string inString) {
  string pattern = @"(?<=\([^\)]*)+,(?!\()(?=[^\(]*\))";
  return Regex.Replace(inString, pattern, "-");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM