简体   繁体   中英

How do I determine a delimiter in a text file

I have 2 types of input files: 1. comma delimited (ie: lastName, firstName, Address) 2. space delimited (ie lastName firstName Address)

The comma delimited file HAS spaces between the ',' and the next word.

How do I go about determining which file I am dealing with ? I am using C# btw

I've done tons of work with various delimited file types and as everyone else is saying, without normalization you can't really handle the whole thing programmatically.

Generally (and it seems like it would be totally necessary for space-delim) a delimited file will have a text qualifier character (often double-quotes). A couple examples of this points:

Space Delimited:

lastName "Von Marshall" is impossible without qualifiers.

Addresses would be altogether impossible as well.

Comma Delimited:

addresses are generally unworkable unless they are broken into separate fields or having a solid string is acceptable for your use-case.

So the space delim should be easy enough to determine since you're looking for " " . If this is the case I'd (personally) replace all " " with "," to change it to comma-delim. That way you'd only have to build a single method for handling the text, otherwise I imagine you'll need methods for spaces and commas separately.

If your comma-delim file does not have a text qualifier, you're in a really tricky spot. I haven't found any "perfect" way of addressing this without any human work, but it can be minimized. I've used Notepad++ a lot to do batch replacement with its regular expression functions.

However, you can also use C#'s regex abilities. Here's what MSDN says on that. So, to answer your question to the best of my ability, unless you can establish a uniqueness between the 2 file types - there's no way. However, if the text has proper text qualifiers, the files have different file extensions, or if the are generated in different directories - you could use any of those qualities or a mix thereof to decide what type of file it is. I have no experience doing this as yet (though I've just started a project using it), so I can't give an exact example, but I can say for anyone to build a perfect example it'd be best if you showed example strings for each file.

As other users have said with some guaranty of having no commas in the space delimited version you cannot with 100% accuracy.

With some information, say that there will always be three fields for all records in all cases when parsed correctly you could just do both and test the results for the correct number of fields. Address is a big block here though since we do not know what that format could be. Also these rules seems odd at best when talking about address.... is

1111somestreest.houston,tx11111 or
1111 somestreet st. Houston, Tx 11111

a valid format?

You could count the number of commas per line of the file. If you have at least 2 commas per line (considering your info is last name, first name, address), you probably have a comma separated. If you have, in at least one line, less than 2 commas, you should consider it as space separated.

I, however, would skip this step and ignore the commas when evaluating the input by replacing all of them by spaces and would implement a single read/grab information procedure (considering only space separated files).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM