简体   繁体   中英

Extracting data from text file with differing delimiters

I have a text file that I need to split into an array, each element of the array will contain data for 1 person. I will then use Regex (C#) to extract all the data for that person. The problem I am having is matching the start of each person as the pattern changes within the file. See below:

A simplified version of the data is below:

Address FirstName \r\nSurname NHS No Age = 44\r\n
Address FirstName\r\n Surname NHS No 12345\r\n
Address FirstName\r\n Surname NHS No Age = 35\r\n
Address FirstName \r\nSurname NHS No 54321\r\n

As you can see there are linebreaks within the file so StreamReader.Readline() method probably won't work. The address name and surname fields are fixed length fields and I can extract these using substring. I can split into the array of people once I have a consistent marker for the start/end of each person.

I need to use Regex.Replace to add a start of person marker, then use this marker to split into the array. I would appreciate any help with this.

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. Jamie Zawinski

Are you convinced that regex will make your code easier to write, read and maintain?

Consider using String.Split() instead.

From your comments, it looks like each row represents a single entity, regardless of the nuances of the format. For start, you could read the file line by line, and split each line into words using String.Split :

using (StreamReader sr = new StreamReader("addresses.txt")) 
{
     string line;
     // Read and display lines from the file until the end of 
     // the file is reached.
     while ((line = sr.ReadLine()) != null) 
     {
         string[] tokens = line.Split(' ');

         // variant 1: Address FirstName Surname NHS No //Person1 Age = 44
         // variant 2: Address FirstName Surname NHS No //person 2 12345

         Console.Writeline("Address: ", tokens[0]);
         Console.Writeline("First name: ", tokens[1]);

         // etc.
     }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM