SSW Foursquare

Rules to Better SQL Databases - Indexing - 6 Rules

Heres how to get started with indexing. This covers the basics to help a developer get started creating indexes on SQL databases.

Want to develop your SQL Server Database with SSW? Check SSW's Databases Consulting page.

  1. Do you know how to visualize SQL indexes?

    It's hard to understand what indexes are on SQL Databases. Find out how to visualize what's happening so you can easily follow what your database queries are doing.

    Database indexes get their name from book indexes because they work similarly. To grasp how they function, imagine the rows in your table as pages in a book. Then, consider what kind of index in the book would help you quickly find the information you need.

    Video: How to visualize SQL indexes? | Bryden Oliver | SSW Rules (3 min)

    For instance in a recipe book that contains many types of dishes, you might want an index of the type of recipe. You may also have an index based on the recipe name, or maybe by ingredient.

    index
    Figure: Typical book index

    SELECT *
    FROM Badges
    WHERE Name = 'John Doe';

    Bad example: Using a table scan - the DB needs to look at every row to find the result

    CREATE INDEX IX_Name_Date
    ON dbo.Badges(Name) INCLUDE (Location);
    
    SELECT *
    FROM Badges
    WHERE Name = 'John Doe';

    Good example: Using an index - DB can quickly find the result

    By using the index, the database can quickly locate the rows where "Name" is 'John Doe' without scanning every row. This approach significantly improves query performance.

    Visualizing indexes

    Another useful thing to know is that if you'd like to see what the index looks like and try using it manually to fulfil your query (this a great way to verify you got the index correct) then you can do the following.

    Just write a SELECT statement that contains all the columns you are including in the index. Use an ORDER BY based on the columns in the Index Key to get the data sorted just like the index you are planning to create.

    Then try fulfilling some simple queries. A good way to know if your index is going to significantly improve performance is if you can quickly seek to the exact record/s you need rather than having to look at every single row one by one.

    For example if you are planning the following index:

    CREATE INDEX IX_Name_Date
      ON dbo.Badges(Name,Date) INCLUDE (Location);

    Then you can visualise the index by running the following query:

    SELECT
      Name, Date, Location
    FROM
      dbo.Badges
    ORDER BY
      Name, Date

    Note how the "Name" and "Date", which are the index keys, are included in the ORDER BY clause and the "Location" is only in the requested columns.

    If you'd like to know more or see this in action, check out How to Plan for Database Performance

  2. Do you know how to index SQL ORDER BY clauses?

    So you've identified that you need to improve the performance of a SQL ORDER BY clause. How can you create an index to improve the performance?

    When creating indexes for your SQL tables, creating indexes to improve the performance of ORDER BY clauses is by far the easiest index to create.

    Creating the index

    Basically just take the columns from the ORDER BY and drop them into the index definition in the same order.

    So for example if you want to optimize the following query by providing an appropriate index:

    SELECT
      DisplayName, Location, FullName
    FROM
      dbo.Users
    ORDER BY
      DisplayName, Location

    The corresponding index definition would look like:

    CREATE INDEX IX_DisplayName_Location ON dbo.Users (DisplayName, Location)

    Including extra columns

    One other thing you can do when creating indexes is to include extra columns not related to the order of the rows in the index. So in the example above, the FullName column appears in the list of columns requested by the query.

    If you don't include FullName in the index at all, then the database engine then has to go back to the orginal table to retrieve the values for that column.

    You can include the FullName column in the index by adding an INCLUDE clause to the index definition.

    CREATE INDEX IX_DisplayName_Location ON dbo.Users (DisplayName, Location) INCLUDE (FullName)

    Be aware that including columns in an index does make the index bigger. So deciding whether to include extra columns or not is a tradeoff between disk space, the extra time it takes to write the index, the extra disk io required when reading back the index, against the improvement achieved for the queries in question.

    If queries consistently want the same small subset of the columns on a table then this will generally be a good optimization.

    Also, it is often better to include the included columns at the end of the index key as there is very little benefit unless the size of the included columns is quite large (think large fixed length strings).

  3. Do you know how to index SQL WHERE clauses?

    So you've identified that a WHERE clause is causing query performance issues. How can you create an index to alleviate the problem?

    Indexing a single comparison in a WHERE clause

    -- Here's our querySo let's say we have the following query:

    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users
    WHERE 
     DisplayName = 'Frank'

    We want to quickly be able to skip to data matching the DisplayName, so we should create an index on that column.

    CREATE INDEX IX_DisplayName ON dbo.Users (DisplayName)

    But the original query was also reading the location column. It may be worth verifying whether including the location column is valuable.

    CREATE INDEX IX_DisplayName_Includes ON dbo.Users (DisplayName) INCLUDE (Location)

    Having created 2 options, it's important to evaluate which of the options works better.

    So test the query as shown.

    SET STATISTICS IO ON;
    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users WITH (Index = 1) -- use the primary key for a baseline
    WHERE 
     DisplayName = 'Frank'
    
    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users WITH (Index = IX_DisplayName)
    WHERE 
     DisplayName = 'Frank'
    
    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users WITH (Index = IX_DisplayName_Includes)
    WHERE 
     DisplayName = 'Frank'

    This returns data like the following.

    (704 rows affected)
    Table 'Users'. Scan count 9, logical reads 45184, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    
    (704 rows affected)
    Table 'Users'. Scan count 1, logical reads 2171, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    
    (704 rows affected)
    Table 'Users'. Scan count 1, logical reads 6, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    So in this example using no index was more than 20 times worse than the first index. However the second index with the location column included was another 300 times better again. This shows that it's always worth testing whether using an INCLUDE clause on an index is effective.

    Indexing multiple comparisons in a WHERE clause

    So if the query becomes a bit more complicated then how would we index that. So for the following:

    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users
    WHERE 
     DisplayName = 'Frank'
     AND Location = 'United States'

    Based on the index created in the previous section, it looks like the index might be optimal. But if you were to run the following.

    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users
    WHERE 
     DisplayName = 'Frank'
    -- AND Location = 'United States'

    You will see it's not actually optimal. If it were then all the United States records would be grouped together.

    CREATE INDEX IX_DisplayName_Location ON dbo.Users (DisplayName,Location)

    This is a better index for the query.However reversing the order of the 2 keys may be better. The key that has more uniqueness is the better one to have first.This index may be better:

    CREATE INDEX IX_Location_DisplayName ON dbo.Users (Location,DisplayName)

    To test this then try the following:

    SET STATISTICS IO ON
    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users WITH (Index = 1)
    WHERE 
     DisplayName = 'Frank'
     AND Location = 'United States'
    
    SELECT 
     Id, DisplayName, Location
    FROM
     dbo.Users WITH (Index = IX_DisplayName_Location)
    WHERE 
     DisplayName = 'Frank'
     AND Location = 'United States'
    
    SELECT 
     Id, DisplayName, Location
    FROM
      dbo.Users WITH (Index = IX_Location_DisplayName)
    WHERE 
     DisplayName = 'Frank'
     AND Location = 'United States'

    Read the io stats from the messages returned by the query to identify which query performed less logical reads. That is the better index for this query.

  4. Do you know how to test your indexes in SQL Server?

    So you've created some indexes. How can you go about verifying that the indexes have actually improved the performance of your queries?

    So the previous rule talked about whether or not to INCLUDE extra columns in the index or not. When doing so, it would be great to be able to evaluate different options and identify which provide the best balance.

    For SQL Server there is what is called a query hint. This particular query hint can be used to force the Query Optimizer to use a particular index on a table.

    So if you create an index called IX_Index and you'd like to see if using the index improves a particular query, you can do something like the following. NB: The table itself is designated as Index 1.

    SET STATISTICS IO ON; # Grab the IO statistics for the query
    SELECT
      Id, DisplayName
    FROM 
      dbo.Users ** WITH (INDEX = 1) # Use the primary key. 
    
    SELECT
      Id, DisplayName
    FROM 
      dbo.Users ** WITH (INDEX = IX_Index) # Use the primary key. 

    You'll get back output like this:

    (704 rows affected)
    Table 'Users'. Scan count 9, logical reads 45184, physical reads 1, page server reads 0, read-ahead reads 44526, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    
    (704 rows affected)
    Table 'Users'. Scan count 1, logical reads 2171, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    So the output contains 1 line for each of the 2 queries that got run. The important value is the number of logical reads performed. Logical reads are a measure of the number of data pages that were read by the query. Logical reads cover every page that was read while physical reads only counts pages that had to be read from disk. Because caching can be inconsistent and the hit ratio for the cache will reduce as the data becomes larger, it's always better to know the logical reads rather than the physical ones.

  5. Do you know how to index SQL JOINs?

    So you've identified that you need to improve the performance of a SQL JOIN. How can you create an index to improve the performance?

    Joins behave just like a where clause. So for this statement:

    SELECT 
     u.DisplayName,
     c.CreationDate,
    FROM
     dbo.Users u
     INNER JOIN dbo.Comments c ON u.Id = c.UserId
    Where
     u.DisplayName = 'Jernej Kavka'

    The following index would provide a performance increase.

    CREATE INDEX IX_UserId_INCLUDES_CreationDate on dbo.Comments (UserId, CreationDate)

    Note the fact that rather than including CreationDate in an included column it's been included in the indexed columns. That's because dates are a nice small column type, so the extra storage in the index tree nodes is negligible.

  6. Do you know how to use SQL Server's automatic index tuning?

    Feel like you don't know enough to manage the indexes on your SQL databases? SQL Server and Azure SQL databases have that covered for you.

    In SQL Server 2022 there are lots of features that make day to day database management much easier. These were introduced in SQL Server 2017 and are continually being improved. The Automatic Tuning features can either be used to make recommendations of changes to indexes and query plans, or it can be allowed to take control and make the recommended changes without user intervention.

    In small organizations without a dedicated database administrator, turning these features on is strongly recommended. Also for less important servers it can free up the database administrator's time to deal with only the most business critical databases and servers.

    See detailed information on this feature.

We open source.Loving SSW Rules? Star us on GitHub. Star
Stand by... we're migrating this site to TinaCMS