简体   繁体   中英

How to parse a file for iphone? Should I use NSScanner?

So I am new to iphone development, but am trying to learn how to take a file that is CSV and read it and save it using Core Data (I think that is the best way?) so that I can display it in a tableview on the iphone. Below is an example of the type of csv file I am working with;

13,1,1,History,,,,,,1
,263,1,Smith,Bob,Freshman
,317,2,Jones,John,Sophmore
14,2,1,Math,,,,,,1
,311,1,Kim,Mary,Freshman
,352,2,Doe,Fred,Senior

Where the first number is the type of class (ie 13 = history), the second number is how many sections (ie 1 = 1 class) the third number is the meeting pattern (ie 1 = monday, wednesday, friday), then the name of the class. The rest of the class line I don't really care about, so I thought I could just have it ignore those characters.

For the 2nd and 3rd lines, the first number after the comma is the student number, then seat number, then last name, first name, year in school.

So I have two main challenges I think. First being how to parse the data so that I can know many and who is in each class so I can call it into a table view (and add to it later), and then I don't know how to associate a integer value with the meeting pattern or class name (ie 13 = history, and 1 = Mon/Wed/Fri)

thank you so much for any help

You packed a lot of questions into a short space!

The answer to your main question is "yes". You should use NSScanner if you have to import a CSV file into your application.

CSV (comma-separated values) is a very tricky file format. At first glance, it looks very simple. Unfortunately, in practice it is anything but simple! It is not even particularly well defined, as it turns out. CSV is the closest thing to a messy "hack" of any type data file I have ever seen!

CSV files were apparently cooked up by Microsoft to work like BASIC "DATA" statements. Something they had been parsing just fine since the mid-1970's. Unfortunately, they should have left it at that. DATA statements were never intended as a file format, just a shortcut to avoid the bother of putting simple data into files in the first place. Better than hardcoding assignment statements but that was about it.

The problem with scanning a CSV file is that it uses as its delimiters things that can occur in the data itself. That does not stop it from being usable. It just adds complications, and these complications add complexity to the scanner. The complexity arrives in the form of special cases. You basically have to define a formal grammar for the CSV file.

Naive implementations do not do this. Your course names or one of the 2 name fields might contain commas in them. For instance, if there is a "junior" in the class, they might have "Jr." after part of their name. Or a class might be called "trig, honors". The user might enter that, the front end program - maybe a simple text editor after all, might not prevent that. So long as the field is wrapped in quotation marks, it is allowed to have commas embedded in it.

I once picked up the ball on a CSV import routine someone had written and checked into our version control as supposedly done. Thing was, it was not. It was blowing up on the forth record of the file. That record had a field in it named something like "XYZ, INC." in it. Well, the embedded comma was throwing it off.

I wrote a real lexer (scanner) for the parsing our CSV import file. That solved the problem. The import routine actually started working then, so I checked it in and marked the "bug" (cough, cough) as "fixed".

As you may have guessed, I think it is highly unlikely the programmer was unaware that his code was unusable, especially given that it did not work on the baseline test data. If you are writing this application for other people to use or going for a high grade, do not finesse the CSV input scanner: do a good job. Anything less would be "faking it".

That is why the best way to handle this issue is to shoot down CSV file support in the requirements stage! Tell stakeholders that tab-delimited files are a much, much simpler format to parse. The only time that CSV has an edge over tab-delimited is the case when embedded newlines and/or tabs need to be supported in any fields of the records.

However, if your project is such a case - try a different tack so you can avoid CSV: suggest XML. Then, write a schema (DTD, XSD, or RNC which is my personal favorite) so you can validate the input with whatever XML parsing API you are using, if your parsing engine supports validation.

If you are stuck with CSV, then luckily there is a very good tutorial showing how to input a CSV using NSScanner called Writing a parser using NSScanner (a CSV parsing example) .

Take a look, as you can see it is not simple - nor did the author make it overly complex. You might want to bookmark the whole blog as well as the article. It is a really excellent weblog for Cocoa developers.

Another example of a CSV scanner is found in Cocoa for Scientists (Part XXVI): Parsing CSV Data . Though it does not give as much explanation on the solution domain problems that CSV presents, nor go into design - you still can see that usable code is going to have to do more than simply split the line up using comma and newline characters.

The next part of your program you will have to think about is Developing with Core Data . Make sure you crate your Cocoa iPhone application in the IDE with Core Data checked. Also, do as much of your data modeling and GUI design as possible in the Interface Builder rather than struggling to write a lot of code manually.

To store the records using the Core Data part of Cocoa you will need to define a subclass of NSManagedObject (see NSManagedObject class reference ). This is a good time, by the way, to look over the Core Data Class Overview at Cocoa Dev Central. Acquaint yourself there with the fundamental object types in Core Data. The diagrams and explanations will make it really clear how the different Core Data abstractions and classes fit together in your application.

Pick a good business object (problem domain) name like Enrollment, or better yet - CourseRegistration . Careful not to pick something that does not sound like solution domain things. In this case, I would specifically stay away from using words like: class, registration, or schedule since those have special meanings in programming. No sense muddying the line between problem domain and solution domain.

If you want to set up a real database, and not just dump the records you read from the CSV import file into a single table in your database - you probably will want to also define NSManagedObject subclasses called Student and Course as well. You will use Interface Builder to inform Core Data that there is a relationship between these two and CourseRegistration.

There is an example in a tutorial that I cite below that shows you how to set up these relationships.

Here is a somewhat outdated walkthrough of the steps to create a Core Data application: Build a Core Data App . It is not outdated programming-wise.

It is just that the Xcode IDE and Interface Builder tool have had some of their dialog boxes heavily reworked. The functionality is the same but it makes screenshots in tutorials written a few years or more ago a bit hard to follow sometimes.

If you have not worked with Core Data before, be aware that an Entity is a persistant object type, instances of which get stored in (and loaded from) the database (or file store). Attributes are basically the fields of the entity. Attributes have names and types, just like properties do.

There are some rules you have to follow when subclassing NSManagedObject as well as using Core Data in general. So I encourage you to read the Introduction to Core Data programming Guide . At least now that you have gotten a bit of help here and from the tutorials, you will not be hitting it cold.

Conveniently , Cocoa provided you with both an NSScanner class to simplify inputing your CSV import file, as well as the Core Data facility you decided to use to persist your data.

As you point out, you are going to want a GUI for editing your dataset. Cocoa uses the Model-View-Controller triad as its GUI design pattern.

NSTableView might be a good thing to put into your application's GUI to do that. That will give you a table view of the records in your user interface.

There is an NSTableView Tutorial you should take a look at over at CocoaDev . The CocoaDev site is quite an excellent resource to guide you through all kinds of areas of Cocoa programming. If you need more help, there is Another NSTableView Tutorial there.

People often get stuck wondering what to use as the controller with an NSTableView. I suggest reading up on the NSArrayController class. You probably to have a look at some Cocoa Bindings Examples and Hints . Cocoa Bindings, in a Nutshell , are a way to keep an attribute in a view in sync with a property of a model object.

Most programmers eventually realize that most database applications simply: collect data, move data values around, and initiate and then propagate changes to pieces of it. Model-View-Controller architecture separates your UI from your business objects in a nice, loosely-coupled fashion.

A binding, is a declarative mechanism for linking them to together. They are still kept loosely coupled though, so do not worry that they violate any architectural rules of the MVC design pattern.

Bindings are handy for doing rapid application development using a WYSIWYG GUI-building tools like Interface Builder.

Without bindings, you would have to manually write procedural code to associate the user interface components with the data in the model. Bindings let you handle that concern in the Interface Builder. The result is you wind up constructing your data management application, rather than "coding" it.

If you need a good book to tie up loose ends about how to use Xcode to write Cocoa or Core Data in particular apps, Xcode 3 Unleashed is pretty good.

It does not detail iPhone development but I assume you have access to documentation that will help you address iPhone-specific limitations and features. Targeting the iPhone means that you will have to use the reference-counting approach to memory management , rather than the newer garbage collection approach that was introduced with Objective-C 2.0.

The NSString method componentsSeparatedByString: could be used here, though a CSV-specific library might be easier to use. There's a nice article (with code) on parsing CSV data here:

For the parsing I would use componentsSeparatedByString : method of NSString class. This works similar to the split function in perl or ruby

I have a CSV parser for Objective-C that will parse any CSV file you throw at it (and if it fails, let me know so I can fix it).

https://github.com/davedelong/CHCSVParser

You can use RegexKitLite . The documentation has an example on how to do this which is just 17 lines long, and that includes comments. While your milage may vary, I've generally found it to be one of the fastest ways to parse CSV data, and one of the easiest to modify to suit your needs since it only takes a couple of lines to do the whole thing.

The article pointed to by @JohnnySoftware is no longer valid, I found it in the wayback machine and am reproducing the contents of it below:

On quite a few occasions, MacResearch readers have posted questions asking how you parse CSV (comma-separated values) data in Cocoa. CSV is a simple standard that is used to represent tables; it is used in widely varying fields, from Science to Finance — basically anywhere a table needs to be stored in a text file.

I've recently added CSV import to my flash card application, Mental Case. Before I began, I thought it would be a trivial matter of searching for some Objective-C sample code or an open source library with Google. I found solutions in scripting languages like Python, but nothing Cocoa based. After an hour or two of searching, I realized that if I wanted a Cocoa-native solution, I was going to have to roll my own. In this short tutorial, I will show you what I came up with, and hopefully save you the trouble of doing it yourself. Simple CSV

Parsing CSV can actually be quite simple, if you know the structure of the data beforehand, and you don't have to deal with quoted strings. In fact, I addressed this in an earlier tutorial that stored spectra in CSV format.

- (BOOL)readFromURL:(NSURL *)absoluteURL ofType:(NSString *)typeName 
    error:(NSError **)outError 
{
    NSString *fileString = [NSString stringWithContentsOfURL:absoluteURL 
        encoding:NSUTF8StringEncoding error:outError];
    if ( nil == fileString ) return NO;
    NSScanner *scanner = [NSScanner scannerWithString:fileString];
    [scanner setCharactersToBeSkipped:
        [NSCharacterSet characterSetWithCharactersInString:@"\n, "]];
    NSMutableArray *newPoints = [NSMutableArray array];
    float energy, intensity;
    while ( [scanner scanFloat:&energy] && [scanner scanFloat:&intensity] ) {
        [newPoints addObject:
            [NSMutableDictionary dictionaryWithObjectsAndKeys:
                [NSNumber numberWithFloat:energy], @"energy",
                [NSNumber numberWithFloat:intensity], @"intensity",
                nil]];
    }
    [self setPoints:newPoints];
    return YES;
}

The NSScanner class is what you use to do most of your string parsing in Cocoa. In the example above, it has been assumed that the CSV file is in a particular form, namely, that it has exactly two columns, each containing a decimal number. By telling the scanner to skip commas [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"\\n, "]]; the parsing of each line is reduced to a single line while ( [scanner scanFloat:&energy] && [scanner scanFloat:&intensity] ) { The scanFloat: method will try to read a floating-point number, returning NO upon failure. So the while loop will continue until the format does not meet expectations.

General CSV

As you can see, parsing CSV data can be very easy, but it is not always the case. When you have to deal with general CSV data, things can get quite complicated, because you have to take account of the possibility that strings contain quotations, and can even extend over multiple lines. For example, the following is a valid line of CSV data, containing two columns:

"The quick, brown fox", "jumped over the ""lazy"", dog"

In case you haven't figured it out, the double quotation marks are treated as single quotations in the string, giving the two strings 'The quick, brown fox' and 'jumped over the "lazy"dog'.

Parsing this general form of CSV is considerably more difficult than the simple form, and it took me quite a while to come up with some clean code to do it. But I think I succeeded in the end. Here it is: (Update: I have changed this code to properly handle all newline varieties.)

@implementation NSString (ParsingExtensions)

-(NSArray *)csvRows {
    NSMutableArray *rows = [NSMutableArray array];

    // Get newline character set
    NSMutableCharacterSet *newlineCharacterSet = (id)[NSMutableCharacterSet whitespaceAndNewlineCharacterSet];
    [newlineCharacterSet formIntersectionWithCharacterSet:[[NSCharacterSet whitespaceCharacterSet] invertedSet]];

    // Characters that are important to the parser
    NSMutableCharacterSet *importantCharactersSet = (id)[NSMutableCharacterSet characterSetWithCharactersInString:@",\""];
    [importantCharactersSet formUnionWithCharacterSet:newlineCharacterSet];

    // Create scanner, and scan string
    NSScanner *scanner = [NSScanner scannerWithString:self];
    [scanner setCharactersToBeSkipped:nil];
    while ( ![scanner isAtEnd] ) {        
        BOOL insideQuotes = NO;
        BOOL finishedRow = NO;
        NSMutableArray *columns = [NSMutableArray arrayWithCapacity:10];
        NSMutableString *currentColumn = [NSMutableString string];
        while ( !finishedRow ) {
            NSString *tempString;
            if ( [scanner scanUpToCharactersFromSet:importantCharactersSet intoString:&tempString] ) {
                [currentColumn appendString:tempString];
            }

            if ( [scanner isAtEnd] ) {
                if ( ![currentColumn isEqualToString:@""] ) [columns addObject:currentColumn];
                finishedRow = YES;
            }
            else if ( [scanner scanCharactersFromSet:newlineCharacterSet intoString:&tempString] ) {
                if ( insideQuotes ) {
                    // Add line break to column text
                    [currentColumn appendString:tempString];
                }
                else {
                    // End of row
                    if ( ![currentColumn isEqualToString:@""] ) [columns addObject:currentColumn];
                    finishedRow = YES;
                }
            }
            else if ( [scanner scanString:@"\"" intoString:NULL] ) {
                if ( insideQuotes && [scanner scanString:@"\"" intoString:NULL] ) {
                    // Replace double quotes with a single quote in the column string.
                    [currentColumn appendString:@"\""]; 
                }
                else {
                    // Start or end of a quoted string.
                    insideQuotes = !insideQuotes;
                }
            }
            else if ( [scanner scanString:@"," intoString:NULL] ) {  
                if ( insideQuotes ) {
                    [currentColumn appendString:@","];
                }
                else {
                    // This is a column separating comma
                    [columns addObject:currentColumn];
                    currentColumn = [NSMutableString string];
                    [scanner scanCharactersFromSet:[NSCharacterSet whitespaceCharacterSet] intoString:NULL];
                }
            }
        }
        if ( [columns count] > 0 ) [rows addObject:columns];
    }

    return rows;
}
@end

(I'm releasing this code into the public domain, so use it as you please.) This code is designed to be a category of NSString. The idea is that it will parse a string into rows and columns, under the assumption that it is in CSV format. The result is an array of arrays; entries in the containing array represent the rows, and those in the contained arrays represent columns in each row.

The code itself is fairly straightforward: It consists of a big while loop which continues until the whole string is parsed. An inner while loop looks through each row of CSV data, looking for significant landmarks, like an end of line, an opening or closing quotation mark, or a comma. By keeping track of opening and closing quotation marks, it is able to properly deal with commas and newlines embedded in quoted strings.

Conclusions

NSScanner is a useful class for parsing strings. It may not be quite as powerful as the regular expressions found in scripting languages like Perl and Python, but with just a few methods — eg, scanString:intoString: scanUpToCharactersFromSet:intoString:, scanFloat: — you can achieve an awful lot. If you need to do any basic string parsing in one of your Cocoa projects, give it a look.

You might want to check out Matt Gallagher's "Cocoa With Love" article on parsing CSV: http://cocoawithlove.com/2009/11/writing-parser-using-nsscanner-csv.html . He's written a full grammar and a couple of useful classes you can pop right into your project.
Howard

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM