简体   繁体   中英

C#: How to extract values from a predefined format of string efficiently?

I have collection of similar strings

for example : string 1: Customer's first Name is john, his last name is glueck,his company name is abc def technolgies llc, he has a balance of 60 dollars.His spending rate is +3.45%

string 2: Customer's first Name is steve, his last name is johnston,his company name is xyz corporation, he has a balance of 800 dollars.His spending rate is -212.86%

Now I have to extract the values like john,glueck,abc def technolgies llc,60,+3.45 from the string 1 and steve,johnston,xyz corporation,800,-212.86 from the string 2.

In our production environment each string is quite large and I have around 83 fields to extract from each string. What is the best way to extract these values?

Is there any method that does opposite of string.format, which takes the reference string & the actual string and returns back the extracted values?

A regular expressions will do the trick.

namespace ConsoleApplication
{
    using System;
    using System.Text.RegularExpressions;

    internal static class Program
    {
        private static void Main()
        {
            var expression = new Regex(
                @"Customer's first Name is (?<FirstName>[^,]+), " +
                @"his last name is (?<LastName>[^,]+), " +
                @"his company name is (?<CompanyName>[^,]+), " +
                @"he has a balance of (?<Balance>[0-9]+) dollars\. " +
                @"His spending rate is (?<SpendingRate>[^%]+)%");

            var line = @"Customer's first Name is john, his last name is glueck, his company name is abc def technolgies llc, he has a balance of 60 dollars. His spending rate is +3.45%";

            var match = expression.Match(line);

            Console.WriteLine("First name......{0}", match.Groups["FirstName"]);
            Console.WriteLine("Last name.......{0}", match.Groups["LastName"]);
            Console.WriteLine("Balance.........{0}", match.Groups["Balance"]);
            Console.WriteLine("Spending rate...{0}", match.Groups["SpendingRate"]);

            Console.ReadLine();
        }
    }
}

OUTPUT

First name......john
Last name.......glueck
Balance.........60
Spending rate...+3.45

After that you can perform some simple string parsing to get numeric values from the strings. Further you will probably have to write a more robust regular expression if there are some variations in the format of the inputs.

(question: You actual input string is the full wordy text: "Customer's first Name is xxxx, his last name is xxxx, his company name is xxxx" etc. Correct?)

The is probably a good case for a Regex. If you use the compile option, you should get a reasonalbe speed out of it. The is essencially the "reverse string.format" you asked about (with a whole bunch more options).

UPDATE:

  // NOTE: pattern assumes a comma after spending rate
  Regex regex = new Regex("Customer's first Name is (\w+), his last name is (\w+),his company name is ([\w\s]+), he has a balance of (\d+) dollars.His spending rate is ([^,]+)");

  string[] values = regex.Split(string1);  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM