CoreData: data integrity and fetching

By Will Braynen

Sometimes you come across code like this: Array(Set(array)).sort(by:) , where array is the result of a fetch from CoreData. These operations might be scattered across different files, but in essence it is exactly that. The reason the author of that code left the hash set (the Set) is to remove duplicates, to “de-dupe”. CoreData lacks the DISTINCT keyword from SQL even though it uses SQLite under the hood. So, you cannot ask for unique elements when fetching. This leaves you with two options: either do the work when you are reading or do the work when you are writing. That is, either filter in memory post fetch or prevent duplicates on insert.

If you are fetching from CoreData in batches thus paging your requests just as you would from the network (e.g. from a REST API) and your batch size is small (e.g. a screenful), then Array(Set(array)).sort(by:) is maybe no big deal. But if you have to fetch a lot of data at once, then this is going to perform slower and slower as you dataset (or batch size) grows.

Should you find yourself writing that kind of code and unable to fetch (because of your product requirements or some other legitimate reason), consider taking a look at using CoreData’s constraints to have CoreData enforce data integrity on insert. When setting up a CoreData constraint, you can specify whatever fits your definition of a duplicate entry (for example, same timestamp and content if it’s messages you are storing). Do keep in mind that there is a cost and heed Apple’s warning:

Uniqueness constraint violations can be computationally expensive to handle. The recommendation is to use only one uniqueness constraint per entity hierarchy, although subentites may extend a superentity’s constraint.

(Source of the warning: https://developer.apple.com/documentation/coredata/nsentitydescription/1425095-uniquenessconstraints)

But nonetheless, it is an option and might fit some use cases nicely: expensive writes, but cheap reads. Once you don’t have to worry about duplicates, you could then ask CoreData to sort the fetched results for you by using a sort descriptor. Instead of Array(Set(array)).sort(by:) , you could simply have array, with the efficiency gains for reads with which it comes.

CoreData doesn’t have a concept of primary keys, but it does have the concept of constraints (for data integrity) and, moreover, the concept of indexed attributes (for fast fetching) and predicates (for filtered queries). These come at a price, but might be better than sorting and filtering in memory.