Benedict's Soapbox

NSManagedObjectContext's parentContext

OS X 10.7 and iOS 5 introduced new features to Core Data. Initially these feature seemed like it would smooth out some of the pain points that have arisen as the use cases for Core Data have evolved. Specifically, it seemed that they would make concurrency simpler. Unfortunately this hasn’t transpire and over 2 years after their introduction there is still confusion surrounding how these features behave. This post describes my understanding of the one of these features, NSManagedObject’s parentContext property.

parentContext is not about concurrency

Unfortunately the documentation for parentContext is both sparse and misleading. As a result there’s a great deal of misunderstanding.

parentContext was introduced alongside a new concurrency model. To use parentContext both the parent and child contexts must adopt the new concurrency model. But the problem addressed by parentContext is not concurrency. Concurrency is just a problem, albeit a significant one, that needed to be solved to for parentContext to be implemented. The intent of parentContext is to improve the atomaticity of changes. parentContext allows changes to be batch up and committed en masse. This has always been possible by using multiple NSManagedObjectContext, but parentContext allows for improved granularity of the batching.

But why does parentContext present concurrency problems?

Prior to the introduction of the new concurrency model concurrency was only supported by use of the confinement pattern. The confinement pattern dictates that all messages sent to a context must be sent on the thread that the context was created on. This put the burden of enforcing concurrency on the objects that use the context; Core Data takes no responsibility for ensuring threading correctness. From the perspective of the Core Data’s framework this wasn’t a problem because the only Core Data objects that would directly message a context were instances of NSManagedObject and these instances were subject to the same confinement restrictions as the context.

But parentContext doesn’t fit with the restrictions of the confinement pattern. A context using parentContext needs to send messages to its’ parent, but the child does not know which thread the parent was created on and so cannot meet the restrictions of the confinement pattern.

The new concurrency model addresses the problem by moving some of the burden of threading from the calling objects to the context. performBlock: (and performBlockWait:) allow the context to be messaged without having to know which thread the context was created on, thus allowing features like parentContext to be implemented without violating the confinement pattern.

How changes propagate

Changes made in a context propagate in 2 direction; up to their ancestors and down to their children. When a change is made in a context, but not saved, it is visible to all of its’ descendants but not to its’ ancestors. When a context is saved the changes are pushed up to its’ direct ancestor, i.e. the value of its’ .parentContext property. But for the changes to persist further up the ancestor chain a save must be repeated on each parent context. When a save is performed on a root context, i.e. a context without a parent context, it is pushed to the persistent store (via the persistent store coordinator) and becomes visible to all contexts connected to the store.

That’s relatively simple, but unsurprisingly there are a few gotchas…

Contexts’ use a cache hierarchy

Each context maintains its own cache. When a child context performs a fetch it will receive the result from the cache of closest ancestor that contains the data or the persistent store if none of the ancestors contain the required data. This behaviour can prevent changes made in an ancestor context being visible to a descendant. Consider the following scenario context hierarchy:

persistentStoreCoordinator -> rootContext -> mainContext -> childContext

  1. mainContext fetches objectA
  2. rootContext fetches objectA
  3. rootContext updates objectA
  4. childContext fetches objectA

In this example childContext will not see the updated value in rootContext, it will see the value in mainContext which, depending on the semantics of the data model, may be incorrect.

The child contexts’ merge policy is ignored

parentContext doesn’t use the existing merge functionality, NSMergePolicy, for handling saves from a child context. When a child context is saved the changes will always overwrite the parents’ data.

There’s an edge case which is worth noting: If a child context is saved but the values do not differ from the cached values then the save is ignored. This only becomes an issue when changes from the parent are not merged into children. (I’ll come back to merging and keeping context in sync later.)

awakeFromInsert doesn’t behave as documented

awakeFromInsert is called when an object is first created, as such you’d expect this method only ever be called once. And that’s exactly what the documentation says:

This method is invoked only once in the object’s lifetime.

But that’s not true. When an inserted object propagates up to a parent context (due to the child being saved) awakeFromInsert is called again in the parent context. awakeFromInsert is called as part of the save; the object doesn’t need to be explicitly accessed in the parent context. When an awakeFromInsert is called in a parent context the object is ‘empty’, it does not yet contain the values from the child context. If we only inspect the object it is impossible to tell if it has been genuinely inserted or if it’s propagating from a child context.

Objects inserted into a child context don’t receive their permanent object ID from a save

When an object is inserted into a context it is assigned a temporary object ID. If the context is a root context then saving will cause the temporary object ID to be replaced with a permanent object ID. When a save is performed on a child context the object does not receive a permanent ID, it just keeps its’ temporary object ID. This is expected as the object has not yet been committed to the persistent store which is responsible for assigning permanent object IDs. So far, so good, but now things get strange.

When an object that was created in a child context is finally committed to the persistent store its’ object ID is only replaced in the root context, the permanent object ID does not bubble down to the child contexts. Once saved it’s possible to retrieve the object in any context using its’ permanent object ID. It is also still possible to retrieve the object using its’ temporary object ID but only in contexts that are ancestors of the context in which the object was inserted (i.e. it is not possible to use the temporary object ID in a different context branch).

Although these 2 objects (the object retrieved with the temporary ID and the object retried with the permanent ID) conceptually refer to the same object they are treated as separate objects (i.e. [tempIDObj isEqual:permIDObj] == NO). If a NSFetchRequest is executed only the object with the temporary object ID is returned. When the context is saved the updates to the object with the temporary object ID takes precedence over the object with the permanent object ID.

Yikes!

Advice

Keep it simple!

Stick with the simplest possible setup. Specifically, a single main Queue context connected directly to the persistent store coordinator. Only augment this setup when there’s a compelling reason to do so. I’ll refer to the context in this simple setup as the primary context. I can think of only 3 uses cases for using additional contexts:

1. A resource intense operations which may block the main thread

The most common example of this is importing data from a web service. Perform resource intense operations by creating a privateQueue context attached directly to the persistent store coordinator. This was the only way to use multiple context prior to the introduction of parentContext and it is still useful. This use case is the trickiest to implement correctly because, unlike the next 2 use cases, it inherently involves concurrent modifications to the object graph. (The fact that concurrency is hard with Core Data is not primarily a failing of Core Data, it’s because concurrency is always a tricky problem.) To minimise the complexities inherent with concurrency it is beneficial to limit the number of concurrent operations. This can be done by using an NSOperationQueue and encapsulating the work inside NSOperation subclasses. These operation should use a privateQueue context, constructed inside init, with the work done inside -main and perfromBlockAndWait:. For example:

@implementation EMKJSONImportOperation
-(id)initWithPersistentStoreCoordinator:(NSPersistentStoreCoordinator *)persistentStoreCoordinator {
    NSParameterAssert(persistentStoreCoordinator);
    self = [super init];
    if (self == nil) return;
    _persistentStoreCoordinator = persistentStoreCoordinator;
    NSManagedObjectContext *context = [NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
    context.persistentStoreCoordinator = persistentStoreCoordinator;
    _context = context;
    return self;
}

2. A group of changes which may be discarded

The most common example is when presenting a modal user interface that the user is able to discard, for example editing a contact. Perform such changes in a mainQueue context with its’ parent context set to the primary context. If the user decides to cancel then simple discard the child context. If the user commits the changes then save the child context thus incorporating the changes back into the primary context. Depending on the specifics of the model you may have to merge in changes in the primary context into the child context.

3. To delay modifying the persistent store

The use case for this is ensuring correct behaviour for the ‘Save’ command in an OS X app. To achieve this insert a privateQueue context between the persistent store coordinator and the primary context. This new context, our ‘proxy context’, effectively acts as a proxy for the persistent store coordinator. No operations should be performed directly on the proxy context. Root contexts (like those described in the first use case) should be attached to the proxy context instead of the persistent store coordinator. We can now implement the ‘Save’ action by simply calling save:error: on the proxy context (be sure to call save inside of performBlock: or performBlockAndWait:).

The downside to this setup is that we loose the merge policy functionality; the merge policy effectively becomes ‘last commit to the proxy context wins’. (A small advantage of this setup is that disk writing will not occur on the main queue and thus reduces the possibility of blocking the UI. But this should be viewed as a perk of the setup rather than a specific intent. If writing to disk is causing the UI to block then you have bigger problems to deal with.)

(Side note: On iOS the file system, and hence the ‘Save’ command, are abstracted away from the user. This raises the question, when should an iOS app perform a save? My approach is to save after each tick of the runloop providing the primary context has changed. This is implemented by using a common base class for all managed objects which asynchronously calls save on the primary context in didChangeValueForKey:. The advantages of this approach are that there are not calls to save:error: littered throughout the code and it forces the primary context to always be in a consistent state at the end of the runloop.)

Children of a mainQueue context should only be other mainQueue contexts

When a data is request in a child context which it cannot satisfy the child contexts’ walks up the context hierarchy until it finds the required data. This process causes each ancestor context to block thus affect the performance of said ancestors. Due to this potential blocking one must be careful when working with mainQueue contexts because blocking a mainQueue context will block the UI. To avoid blocking the UI it is wise to only attach other mainQueue contexts to a mainQueue context.

Discard child contexts after saving

To avoid the many potential pitfalls that arise from context caches it is best to treat non-primary contexts as ‘one use only’. Once a context has been saved then it should be disposed. This applies to both background operations (as described in the first use case) and mainQueue contexts (second use case).

Keep mainQueue contexts in sync with their parent

MainQueue contexts, such as those described in the second use case, may need to be kept in sync with their parent context. We do this by refreshing the objects when change notifications are posted. For a change to be visible to a child context the objects must be refreshed in the child context and in all the intermediate contexts between the child and were the change occurred. To keep a child context in sync with its’ ancestors 2 notifications need to be observed; NSManagedObjectContextObjectDidChangeNotification and NSManagedObjectContextDidSaveNotification.

NSManagedObjectContextObjectsDidChangeNotification

When this notification is posted we must refresh objects in all desentant contexts of the context that posted the notification. To refresh an object call refreshObject:mergeChanges: on the context. Passing YES for mergeChanges: means that changes made to the object are re-applied once after the merge. In most cases this is the desired behaviour. If the object being refreshed had been changed in the receiving context then the receiving context will post its own NSManagedObjectContextObjectsDidChangeNotification. This means that we may be redundantly call refresh on objects in contexts with multiple ancestors, but it doesn’t cause any problem. refreshObject:mergeChanges: doesn’t cause the object to be refreshed immediately. The object is refreshed when its’ values are next accessed. This means that providing objects are not accessed, the order in which contexts are refreshed is irrelevant. Note that NSManagedObjectContextObjectsDidChangeNotification are posted asynchronously so that there is only one notification posted per tick of the runloop. Here’s an example of notification handler:

-(void)refreshChildContextWithObjectDidChangeNotification:(NSNotification *)notification {
    NSManagedObjectContext *changedContext = notification.object;
  NSManagedObjectContext *childContext = self.managedObjectContext;
  BOOL isParentContext = childContext.parentContext == changedContext;
  if (!isParentContext) return;
  //Collect the objectIDs of the objects that changed
  NSMutableSet *objectIDs = [NSMutableSet set];
  [changedContext performBlockAndWait:^{
      NSDictionary *userInfo = notification.userInfo;
      for (NSManagedObject *changedObject in userInfo[NSUpdatedObjectsKey]) {
          [objectIDs addObject:changedObject.objectID];
      }
      for (NSManagedObject *changedObject in userInfo[NSInsertedObjectsKey]) {
          [objectIDs addObject:changedObject.objectID];
      }
      for (NSManagedObject *changedObject in userInfo[NSDeletedObjectsKey]) {
          [objectIDs addObject:changedObject.objectID];
      }
  }];
  //Refresh the changed objects
  [childContext performBlockAndWait:^{
      for (NSManagedObjectID *objectID in *objectIDs) {
          NSManagedObject *object = [childContext registeredObjectWithID:objectID];
          [childContext refreshObject:object mergeChanges:YES];
      }
  }];

NSManagedObjectContextDidSaveNotification

When NSManagedObjectContextDidSaveNotification is posted mergeChangesFromContextDidSaveNotification: should be called if the posting context is a root context. It’s not necessary to call mergeChangesFromContextDidSaveNotification: when the posting context is a non-root ancestor because the changes will have already been merged in from observing NSManagedObjectContextObjectsDidChangeNotification. (After mergeChangesFromContextDidSaveNotification: is called the context posts a NSManagedObjectContextObjectsDidChangeNotification but this notification does not contain the changed objects so it is not useful.) An example implementation:

-(void)refreshContextWithDidSaveNotification:(NSNotification *)notification {
    NSManagedObjectContext *savedContext = notification.object;
    BOOL isRootContext = [savedContext parentContext] == nil;
    if (!iRootContext) return;
    NSManagedObjectContext *targetContext = self.managedObjectContext;
    BOOL isCorrectStore = targetContext.persistentStoreCoordinator == savedContext.persistentStoreCoordinator;
    if (!isCorrectStore) return;

If an NSFetchRequest has been performed in a child context it may be necessary to re-execute it. A better solution would be to use an NSFetchedRequestController which monitors for changes and provides callbacks for updating the UI.

Avoid passing temporary objectIDs between contexts

Be careful when passing object IDs between contexts. When creating an object in a child context call obtainPermenantIDForObjects:error: before passing the objectID to another context. obtainPermenantIDForObjects:error: causes the persistent store to be written to so keep the number of times this method is called to a minimum. If passing from a child context back to its’ parent then a good time to call obtainPermenantIDForObjects:error: is immediately before calling save:. If passing from a parent to child then a good time is when the child context is being configured.

Don’t use awakeFromInsert

The documentation for awakeFromInsert states:

You typically use this method to initialize special default property values

Note that it says “property”. A property can be either an attribute or a relationship. Before parentContext we could use awakeFromInsert to create additional related objects. If, for example, we had a person object and each person has relationship to a diary object then we could use the person subclasses awakeFromInsert to create the diary object. But because awakeFromInsert can be called multiple times creating objects and setting relationships will result in data inconsitency errors or abandoned objects.

We could try and work round this problem(/bug). We could maintain a global set of objects that have had their setup code ran. But that raises more issues: Would we still have to call NSManagedObject’s implementation of awakeFromInsert? How so we structure the check so that it works correctly when subclassed? How do we write the code so it’s not fragile to future modifications?

I think the best approach is to admit defeat and give up on awakeFromInsert. Instead, implement a factory method on the class object that creates and configures a new object. This approach isn’t ideally because it means that the interface from the subclass diverge from the super class without providing additional functionality (and thus adding complexity to the interface for no gains). However, it means that we avoid all the problems with awakeFromInsert and still encapsulate the desired functionality.

Conclusion

parentContext does provides features that simplify a handful of use cases. Unfortunately the short comings of parentContext mean that it can not be adopted piecemeal. The top of the Core Data stack are managed objects. A good model will provide an interface that works at a high level of abstraction. Creating such an interface requires encapsulating implementation detail. The way that Core Data is designed means that the natural place for this code is in managed object subclasses. Because parentContext affects the behaviour of managed objects adopting it makes it difficult to write managed object subclasses without knowing the context hierarchy in which they’ll be used. Proceed with extreme caution!

Acknowledgement

Thanks to Mike Abdullah for his insights at LIDG.