使用CoreData在iPhone上导入大型数据集

Question

I'm facing very annoying problem. 我面临着非常烦人的问题。 My iPhone app is loading it's data from a network server. 我的iPhone应用程序正在从网络服务器加载它的数据。 Data are sent as plist and when parsed, it neeeds to be stored to SQLite db using CoreData. 数据以plist形式发送，在解析时，需要使用CoreData将其存储到SQLite数据库。

Issue is that in some cases those datasets are too big (5000+ records) and import takes way too long. 问题是，在某些情况下，这些数据集太大（5000多条记录），导入时间太长。 More on that, when iPhone tries to suspend the screen, Watchdog kills the app because it's still processing the import and does not respond up to 5 seconds, so import is never finished. 更多的是，当iPhone试图暂停屏幕时，Watchdog会杀死应用程序，因为它仍处理导入并且最多不响应5秒，因此导入永远不会完成。

I used all recommended techniques according to article "Efficiently Importing Data" http://developer.apple.com/mac/library/DOCUMENTATION/Cocoa/Conceptual/CoreData/Articles/cdImporting.html and other docs concerning this, but it's still awfully slow. 我根据文章“有效导入数据”使用了所有推荐的技术http://developer.apple.com/mac/library/DOCUMENTATION/Cocoa/Conceptual/CoreData/Articles/cdImporting.html以及其他有关此问题的文档，但它仍然非常糟糕慢。

Solution I'm looking for is to let app suspend, but let import run in behind (better one) or to prevent attempts to suspend the app at all. 我正在寻找的解决方案是让应用程序暂停，但让导入后面运行（更好的一个）或防止尝试暂停应用程序。 Or any better idea is welcomed too. 或者也欢迎任何更好的想法。

Any tips on how to overcome these issues are highly appreciated! 任何有关如何克服这些问题的提示都非常感谢！ Thanks 谢谢

Answer 1

Instead of pushing plist files to the phone, you might want to send ready to use sqlite files. 您可能希望发送准备使用sqlite文件，而不是将plist文件推送到手机。 This has many advantages: 这有很多好处：

no need to import on the phone 无需在手机上导入
more compact 更紧凑

If you always replace the whole content simply overwrite the persistent store in the device. 如果您始终替换整个内容，只需覆盖设备中的持久存储即可。 Otherwise you may want to maintain an array as plist with all sqlites you have downloaded and then use this to add all stores to the persistentStoreCoordinator. 否则，您可能希望将数组维护为包含已下载的所有sqlite的plist，然后使用此数据将所有存储添加到persistentStoreCoordinator。

Bottom line: use several precompiled sqlite files and add them to the persistentStoreCoordinator. 底线：使用几个预编译的sqlite文件并将它们添加到persistentStoreCoordinator。

You can use the iPhone Simulator to create those CoreData-SQLite-Stores or use a standalone Mac app. 您可以使用iPhone模拟器创建这些CoreData-SQLite商店或使用独立的Mac应用程序。 You will need to write both of those yourself. 您需要自己编写这两个。

Answer 2

First, if you can package the data with the app that would be ideal. 首先，如果您可以使用理想的应用程序打包数据。

However, assuming you cannot do that then I would do then following: 但是，假设您不能这样做，那么我会这样做：

Once the data is downloaded break it into multiple files before import. 下载数据后，在导入之前将其分成多个文件。
Import on a background thread, one file at a time. 在后台线程上导入，一次导入一个文件。
Once a file has been imported and saved, delete the import file. 导入并保存文件后，删除导入文件。
On launch, look for those files waiting to be processed and pick up where you left off. 启动时，查找等待处理的文件，然后从中断处继续。

Ideally sending the data with the app would be far less work but the second solution will work and you can fine-tune the data break up during development. 理想情况下，使用应用程序发送数据的工作要少得多，但第二种解决方案可行，您可以在开发过程中微调数据分解。

Answer 3

I solved a similar problem by putting the insert processing in a background thread. 我通过将插入处理放在后台线程中解决了类似的问题。 But first I created a progress alert so the user couldn't manipulate the data store while it was inserting the entries. 但首先我创建了一个进度警报，以便用户在插入条目时无法操作数据存储。

This is basically the ViewControllers viewDidLoad 这基本上是ViewControllers viewDidLoad

- (void)viewDidLoad 
{
    [super viewDidLoad];

    NSError *error = nil;
    if (![[self fetchedResultsController] performFetch:&error]) {
        NSLog(@"Unresolved error %@, %@", error, [error userInfo]);
        abort();
    }

    // Only insert those not imported, here I know it should be 2006 entries
    if ([self tableView:nil numberOfRowsInSection:0] != 2006) {

        // Put up an alert with a progress bar, need to implement
        [self createProgressionAlertWithMessage:@"Initilizing database"];  

        // Spawn the insert thread making the app still "live" so it 
        // won't be killed by the OS
        [NSThread detachNewThreadSelector:@selector(loadInitialDatabase:) 
                                 toTarget:self 
                      withObject:[NSNumber numberWithInt:[self tableView:nil 
                                                numberOfRowsInSection:0]]];
    }
}

The insert thread was done like this 插入线程是这样完成的

- (void)loadInitialDatabase:(NSNumber*)number
{
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    int done = [number intValue]+1; // How many done so far

    // I load from a textfile (csv) but imagine you should be able to 
    // understand the process and make it work for your data
    NSString *file = [NSString stringWithContentsOfFile:[[NSBundle mainBundle]
                                                pathForResource:@"filename"
                                                         ofType:@"txt"] 
                                               encoding:NSUTF8StringEncoding
                                                  error:nil];

    NSArray *lines = [file componentsSeparatedByString:@"\n"];

    float num = [lines count];
    float i = 0;
    int perc = 0;

    for (NSString *line in lines) {
        i += 1.0;

        if ((int)(i/(num*0.01)) != perc) {
            // This part updates the alert with a progress bar
            // setProgressValue: needs to be implemented 
            [self performSelectorOnMainThread:@selector(setProgressValue:) 
                                   withObject:[NSNumber numberWithFloat:i/num] 
                                waitUntilDone:YES]; 
            perc = (int)(i/(num*0.01));
        }

        if (done < i) // keep track of how much done previously
            [self insertFromLine:line]; // Add to data storage...

    }

    progressView = nil;
    [progressAlert dismissWithClickedButtonIndex:0 animated:YES]; 
    [pool release];
}

It's a bit crude this way, it tries to init the data storage from where it left of if the user happend to stop it the previous times... 这种方式有点粗糙，如果用户发生以前停止它，它会尝试从它离开的位置初始化数据存储...

Answer 4

I had a similar problem importing many objects into CoreData. 我在将很多对象导入CoreData时遇到了类似的问题。 Initially i was doing a save on the managed object context after every object i wished to create & insert. 最初，我想在创建和插入的每个对象之后对托管对象上下文进行save 。

What you should do is create/initialize each object you want to save in CoreData, and after you have looped through all your remote data + created the objects, do a managed object context save . 您应该做的是创建/初始化要在CoreData中保存的每个对象，并在循环完所有远程数据+创建对象后，执行托管对象上下文save 。

I guess you could look at this as doing doing a transaction in a SQLite database: begin transaction, do lots of inserts/updates, end transaction. 我想你可以把它看作是在SQLite数据库中做一个事务：开始事务，做大量的插入/更新，结束事务。

if this still is too lengthy, just thread the darn task and prevent user interaction until complete 如果这仍然太冗长，只需编写一个darn任务并阻止用户交互直到完成

Answer 5

I work on an app that regularly has to process 100K inserts, deletes, and updates with Core Data. 我在一个应用程序上工作，该应用程序经常需要使用Core Data处理100K插入，删除和更新。 If it is choking on 5K inserts, there is some optimization to be done. 如果它在5K插入物上窒息，则需要进行一些优化。

Firstly, create some NSOperation subclass for processing the data. 首先，创建一些NSOperation子类来处理数据。 Override its -main method to do the processing. 重写其-main方法以进行处理。 This method is, however, not guaranteed to run on the main thread. 但是，不保证在主线程上运行此方法。 Indeed, its purpose is to avoid executing costly code on the main thread which would affect the user experience by making it freeze up grossly. 实际上，它的目的是避免在主线程上执行代价高昂的代码，这会严重影响用户体验。 So within the -main method, you need to create another managed object context which is the child of your main thread's managed object context. 因此，在-main方法中，您需要创建另一个托管对象上下文，该上下文是主线程的托管对象上下文的子代。

- (void)main
{
  NSManagedObjectContext *ctx = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
  [ctx setPersistentStoreCoordinator:mainManagedObjectContext.persistentStoreCoordinator];
  [ctx setUndoManager:nil];
  // Do your insertions here!
  NSError *error = nil;
  [ctx save:&error];
}

Given your circumstances, I don't believe you need an undo manager. 鉴于您的情况，我认为您不需要撤消管理器。 Having one will incur a performance penalty because Core Data is tracking your changes. 拥有一个将导致性能损失，因为Core Data正在跟踪您的更改。

Use THIS context to perform all of your CRUD actions in the -main method, then save that managed object context. 使用此上下文在-main方法中执行所有CRUD操作，然后保存该托管对象上下文。 Whatever owns your main thread's managed object context must register to respond to the NSNotification named NSManagedObjectContextDidSaveNotification. 无论您拥有主线程的托管对象上下文，都必须注册以响应名为NSManagedObjectContextDidSaveNotification的NSNotification。 Register like so: 注册如下：

[[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(mocDidSaveNotification:) name:NSManagedObjectContextDidSaveNotification object:nil];

Then define that selector: 然后定义该选择器：

- (void)mocDidSaveNotification:(NSNotification *)notification
{
  NSManagedObjectContext *ctx = [notification object];
  if (ctx == mainManagedObjectContext) return;
  [mainManagedObjectContext mergeChangesFromContextDidSaveNotification:notification];
}

When all of this comes together, it will allow you to perform long-running operations on background threads without blocking the UI thread. 当所有这些结合在一起时，它将允许您在后台线程上执行长时间运行的操作，而不会阻塞UI线程。 There are several variations of this architecture, but the central theme is this: processing on BG thread, merge on main thread, update your UI. 这个架构有几种变体，但中心主题是：在BG线程上处理，在主线程上合并，更新你的UI。 Some other things to keep in mind: (1) keep an autorelease pool around during your processing and drain it every so often to keep your memory consumption down. 还需要记住一些其他事项：（1）在处理过程中保持一个自动释放池，并经常将其耗尽，以减少内存消耗。 In our case, we do it every 1000 objects. 在我们的例子中，我们每1000个对象做一次。 Adjust for your needs, but keep in mind that draining can be expensive depending on the amount of memory required per object, so you don't want to do it too often. 根据您的需要进行调整，但请记住，根据每个对象所需的内存量，耗尽可能会很昂贵，因此您不希望经常这样做。 (2) try to pare your data down to the absolute minimum that you need to have a functional app. （2）尝试将您的数据削减到您需要拥有功能应用程序所需的绝对最小值。 By reducing the amount of data to parse, you reduce the amount of time required to save it. 通过减少要解析的数据量，可以减少保存数据所需的时间。 (3) by using this multithreaded approach, you can concurrently process your data. （3）通过使用这种多线程方法，您可以同时处理您的数据。 So create 3-4 instances of your NSOperation subclass, each of which processes only a portion of the data so that they all run concurrently, resulting in a smaller amount of real time consumed for parsing the data set. 因此，创建您的NSOperation子类的3-4个实例，每个实例仅处理一部分数据，以便它们全部并发运行，从而导致解析数据集所消耗的实时数量较少。

Answer 6

Is there any way you can pack the data ahead of time - say during development? 有没有什么方法可以提前打包数据 - 比如开发期间？ And when you push the app to the store, some of the data is already there? 当你将应用程序推送到商店时，一些数据已经存在？ That'll cut down on the amount of data you have to pull, thus helping to solve this issue? 这会减少你必须提取的数据量，从而有助于解决这个问题？

If the data is time sensitive, or not ready, or for whatever reason you can't do that, could you compress the data using zlib compression before you ship it over the network? 如果数据是时间敏感的，或者没有准备好，或者由于某种原因你不能这样做，你可以在通过网络发送数据之前使用zlib压缩来压缩数据吗？

Or is the problem that the phone dies doing 5K+ inserts? 或者是手机死了5K +插件的问题？

Answer 7

I imagine you aren't showing all 5K records to the client? 我想你没有向客户展示所有5K记录？ I'd recommend doing all of the aggregation you need on the server, and then only sending the necessary data to the phone. 我建议您在服务器上进行所需的所有聚合，然后只将必要的数据发送到手机。 Even if this involves generating a few different data views, it'll still be orders of magnitude faster than sending (and then processing) all those rows in the iPhone. 即使这涉及生成一些不同的数据视图，它仍然比发送（然后处理）iPhone中的所有这些行快几个数量级。

Are you also processing the data in a separate (non event/ui) thread? 您是否也在单独的（非事件/ ui）线程中处理数据？

Answer 8

Any chance you can setup your server side to expose a RESTful web service for processing your data? 您是否有机会设置服务器端以公开RESTful Web服务来处理数据？ I had a similar issue and was able to expose my information through a RESTful webservice. 我遇到了类似的问题，并且能够通过RESTful Web服务公开我的信息。 There are some libraries on the iphone that make reading from a webservice like that very easy. iphone上有一些库可以很容易地从web服务中读取。 I chose to request JSON from the service and used the SBJSON library on the iphone to quickly take the results I got and convert them to dictionaries for easy use. 我选择从服务中请求JSON并使用iphone上的SBJSON库快速获取我得到的结果并将它们转换为字典以便于使用。 I used the ASIHTTP library for making the web requests and queueing up follow up requests and making them run in the background. 我使用ASIHTTP库来发出Web请求并排队跟进请求并使它们在后台运行。

The nice thing about REST is that it a built in way for you to grab batches of information so that you don't need to arbitrarily figure out how to break up your files you want to input. REST的优点在于它是一种内置的方式，您可以获取批量信息，这样您就不需要随意弄清楚如何分解您想要输入的文件。 You just setup how many records you want to get back, and the next request you skip that many records. 您只需设置要返回的记录数，并在下一个请求中跳过该记录。 I don't know if that is even an option for you, so I'm not going into a lot of code examples right now, but if it is possible, it may be a smooth way to handle it. 我不知道这对你来说是否是一个选项，所以我现在不会进入很多代码示例，但如果有可能，它可能是一种处理它的平滑方式。

Answer 9

Lets accept that Restful (lazy loading) is not an option... I understand you want to replicate. 让我们接受Restful（延迟加载）不是一个选项...我知道你想要复制。 If the load problem is of the type 'less and less rows loading in more and more time) then in psuedo code... 如果加载问题的类型'越来越少的行加载越来越多的时间），那么在伪代码中......

[self sQLdropIndex(OffendingIndexName)]
[self breathInOverIP];
[self breathOutToSQLLite];
[self sQLAddIndex(OffendingIndexName)]

This should tell you lots. 这应该告诉你很多。

使用CoreData在iPhone上导入大型数据集

问题描述

9 个解决方案

解决方案1
4 2010-01-26 16:45:03

解决方案2
4 2010-01-27 08:23:16

解决方案3
2 2010-01-26 16:53:55

解决方案4
1 2011-02-16 18:24:46

解决方案5
0 2013-12-16 18:42:46

解决方案6
0 2010-01-26 16:36:44

解决方案7
0 2010-01-26 16:53:15

解决方案8
0 2010-03-05 18:31:55

解决方案9
0 2010-03-26 18:18:55

使用CoreData在iPhone上导入大型数据集

问题描述

9 个解决方案

解决方案1 4 2010-01-26 16:45:03

解决方案2 4 2010-01-27 08:23:16

解决方案3 2 2010-01-26 16:53:55

解决方案4 1 2011-02-16 18:24:46

解决方案5 0 2013-12-16 18:42:46

解决方案6 0 2010-01-26 16:36:44

解决方案7 0 2010-01-26 16:53:15

解决方案8 0 2010-03-05 18:31:55

解决方案9 0 2010-03-26 18:18:55

解决方案1
4 2010-01-26 16:45:03

解决方案2
4 2010-01-27 08:23:16

解决方案3
2 2010-01-26 16:53:55

解决方案4
1 2011-02-16 18:24:46

解决方案5
0 2013-12-16 18:42:46

解决方案6
0 2010-01-26 16:36:44

解决方案7
0 2010-01-26 16:53:15

解决方案8
0 2010-03-05 18:31:55

解决方案9
0 2010-03-26 18:18:55