MongoDB multikey indexes and index intersection

MongoDB multikey indexes and index intersection

  • 2016-08-05
  • 873

MongoDB has great support for Arrays . You can embed arrays in documents, embed documents in arrays etc – it affords a lot of flexibility in your document model. However working with arrays has a number of gotcha’s in Mongodb. In this post we will look at some of the issues with indexes and arrays.

Multikey Indexes

In mongodb you can index an array field. This creates an index entry for each element in the array. The resulting index is called a ‘Multi key’ index. Multikey indexes can be created over scalar values or embedded documents. For more information on multi key indexes refer to the documentation

Multi key indexes although useful have several limitations

  • If you create a compound multi key index then you have atmost one field that is an array
  • A compound index cannot be used a shard key
  • A compound index cannot be a hashed index

One of the most interesting aspects of multi key indexes is how index intersection bounds are calculated – more on that below

Index intersect bounds

Here is how the MongoDB documentation defines Index intersect bounds

“The bounds of an index scan define the portions of an index to search during a query. When multiple predicates over an index exist, MongoDB will attempt to combine the bounds for these predicates by either intersection or compounding in order to produce a scan with smaller bounds.”

Range queries on Arrays

Lets get started with a simple example to see how mongodb computes index bounds for queries on arrays. Assume we have the following three documents in a collection

{x: 65}
{x: 35}
{x: [12,95]}

We issue the following query

db.coll.find({x :{ $gt :22, $lt:55})

The query is simple enough – you would expect the answer to be {x:35} but the query returns

{x:35}
{x:[25,95]}

The reason is how mongodb deals with arrays – the same element of the array does not need to match both conditions, as long as there is one element matching each condition its a match.  So in this case the bounds are [22, Infinity] and [-Infinity, 55]. Since an ‘elemMatch’ operator is not used mongodb does not use the index intersection. MongoDB does not specify which of these ranges [22, Infinity] or [-Infinity, 55] will be used for the execution of the query.

If we want to use the index intersection then we need to use the follow query

db.coll.find(x :{ $elemMatch:{$gt :22,$lt:55}})

When you use this mongodb intersects the index bounds and uses [22, 55] as the bounds. As expected this query does not return any results (elemMatch does not match non arrays). So essentially range queries on arrays are fairly useless without the $elemMatch operator

Compound multikey indexes - mixing array and non array fields

Consider a collection with the following documents

{item: 35, prices:[250,35]}
......
{item: 106, prices:[1500,65]}

We’ll add a compound index on this collection

db.ensureIndex({item:1, prices:1});

Now lets run a simple query.

db. coll. find({item: {$gt:12, $lt:65}});

The  query looks simple enough – we are using a non array item with a fixed range and I expect the Index intersect bounds to be something like item:[[12,65] ] for the query. However if you run an explain you will see this

"indexBounds" : {
"item" : [ [ -Infinity, 65 ] ],
"prices" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ]
},

The reason is that mongodb detects that this is a “Multi key” index and does not process the index bounds intersection. It does not matter that your query is not using any array fields. The moral of the story is that when you mix array and non array fields in a index always keep an eye on your index intersection bounds – odds are that it is not efficient.

Expert MongoDb

MongoDb Tutorial for Beginners

Mean Stack Tutorial for Beginners

Angular 2 and NodeJS - The Practical Guide to MEAN Stack 2.0