What’s interesting in multiple-column indexes?

MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column.
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.
Another realization is to simply have a b-tree within a b-tree. When you hit a leaf on the first column, you get both a list of matching records and a mini b-tree of the next column and so on. Thus, the order of the columns specified in the index makes a huge difference on whether that index will be useful for particular queries.

Multiple column index (A, B, C):

WHERE A=10 AND B>405;                A B C          Constant: A=10

WHERE B=10 A=9 AND C<504;            A B C          Range: A>10

WHERE A=10 AND B=7 ORDER BY C;       A B C

WHERE B=3                            ø

WHERE A=10 AND B>4 AND C>17;         A B C

WHER A=10 ORDER BY C;                A B C

WHERE A=10 ORDER BY B, C;            A B C

WHERE A=10 ORDER BY B, C DESC;       A B C

On the example above, various options are presented, and it should be noted that the index breaks off on the first inequality, i.e. it is used sequentially from left to right to the first inequality. After the inequality, the rest of the index is no longer used. Those, when condition A is a constant, B is a range, the first two parts will be used. In the second case, the whole index will be used. In the third case, the index will also be used entirety. The order of the constants in the condition does not matter, and the server can rearrange them. In the case of B = 3, the index, in general, will not be used. In the case of a constant and an inequality, only the first two parts will be used, the index will no longer be able to be used for the third part. In the case of A – a constant and sorting by the last, the index will only be used for the first part, because the second part is skipped, the ultimate index will be used entirely. And in the latter variant, again, the index will only be used for the first part, because for sorting the index can only be used when in one direction it goes.
This is due to the fact that the composite index in MySQL is, in fact, the usual b-tree index over the concatenation of included columns, accordingly, it can move either by increasing the entire tuple or by its descending order. Therefore, we cannot say: “We move in increasing B and decreasing C” by the index. This restriction is typical for MySQL.

Nuances

List

Where A in (0,1) and B=5
A B
Range
↓
Where A beetween 0 and 1 and B=5
A B

There is such a nuance here. For example, A in (0,1) and A between 0 and 1 are equivalent forms, in both cases those are ranges, but in the case when it is A in (0,1) – list, MySQL realizes that it is not range and replaces it by multiple equality conditions. In this case, MySQL will use the index. So one needs to make a right decision – either to write by a list or by putting <> signs.
Let’s look another example:

Where B=5 <=> Where A in (0,1) and B=5

(select .. Where A=0 and B=5 order by C limit 10)

union all

(select .. Where A=1 and B=5 order by C limit 10)

order by C limit 10;

Here if we have the index(A, B), then it will not be used for “where B = 5” condition, but we can make it in an application so that it independently substitutes the missing condition if there is a small possible version of the list. But with this recommendation, you need to be very cautious, because, despite the fact that in the presence of a list, the further use of the index is not discarded, it cannot be used for sorting, i.e. only on equality. Therefore, if we have queries for sorting, then we will have to rebuild the query through “union all” so that there are no lists in order to use sorting. Naturally, this is not always possible and not always convenient.

Index Condition Pushdown

Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.
For example, suppose you have a key defined as:

index 'i_partkey' ('partkey', 'quantity', 'shipmode', 'shipinstruct')

and the WHERE condition defined as:

partkey = X

and quantity >= 1

and quantity <= 1+10

and shipmode in ('AIR', 'AIR REG')

and shipinstruct = 'DELIVER IN PERSON'

Then MySQL will use the key as if it’s only defined as including columns partkey and quantity and will not filter using the columns shipmode and shipinstruct. And so all rows matching condition partkey = x and quantity >= 1 and quantity <= 1+10 will be fetched from the Primary Key and returned to MySQL server which will then, in turn, apply the remaining parts of the WHERE clause shipmode in (‘AIR’, ‘AIR REG’) and shipinstruct = ‘DELIVER IN PERSON’. So clearly if you have thousands of rows that match partkey and quantity but only a few hundred that match all the condition, then obviously you would be reading a lot of unnecessary data.
This is where ICP comes into play. With ICP, the server pushes down all conditions of the WHERE clause that match the key definition to the storage engine and then filtering is done in two steps:

  • Filter by the prefix of the index using traditional B-Tree index search
  • Filter by applying the where condition on the index entries fetched

There are common nuances with other databases, for example, when the index is part of an expression, as, in PostgreSQL, it cannot be converted, so if we have id + 1 = 3 in the query, the id will not be used. We must bear it ourselves. If the index is part of some expression, we need to look at whether we can transform it so that the index is rendered explicitly to the left.
Likewise, due to the fact that it does not perform transformations (not only mathematical, it also can be a mismatch between encoding, type conversion, etc.), the index will not be used either. The index is not used when the first part is skipped when searching for the suffix. And, of course, the index will not be used when comparing with fields of a similar table, because in this case, it will first need to read the entry from the table to compare.

Comments(0)

Leave a Comment