All of the Following Determine How Many Columns Can Be Included on a Page Except
SQL Server 2005 added the ability to include nonkey columns in a nonclustered index. In SQL Server 2000 and before, for a nonclustered index, all columns divers for an index were key columns, which meant they were part of every level of the index, from the root downwards to the foliage level. When a column is defined as an included column, it is office of the foliage level only. Books Online notes the post-obit benefits of included columns:
- They tin be data types not allowed as alphabetize central columns.
- They are not considered by the Database Engine when calculating the number of index key columns or alphabetize key size.
For example, a varchar(max) column cannot exist function of an index fundamental, but it can be an included column. Farther, that varchar(max) column doesn't count against the 900-byte (or xvi-column) limit imposed for the alphabetize key.
The documentation besides notes the following functioning benefit:
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the alphabetize either every bit cardinal or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index information is not accessed resulting in fewer disk I/O operations.
We can infer that whether the alphabetize columns are central columns or nonkey columns, we get an comeback in functioning compared to when all columns are not function of the alphabetize. But, is at that place a performance difference between the two variations?
The Setup
I installed a copy of the AdventuresWork2012 database and verified the indexes for the Sales.SalesOrderHeader table using Kimberly Tripp's version of sp_helpindex:
USE [AdventureWorks2012] ; GO EXEC sp_SQLskills_SQL2012_helpindex N'Sales.SalesOrderHeader' ;
Default indexes for Sales.SalesOrderHeader
We'll get-go with a directly-forward query for testing that retrieves information from multiple columns:
SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( 24-hour interval , [OrderDate] , [ShipDate] ) AS [DaysToShip] , [SubTotal] FROM [Sales] . [SalesOrderHeader] WHERE [CustomerID] BETWEEN 11000 and 11200 ; If we execute this confronting the AdventureWorks2012 database using SQL Sentry Program Explorer and cheque the programme and the Table I/O output, nosotros see that we become a clustered index scan with 689 logical reads:
Execution plan from original query
(In Direction Studio, you lot could meet the I/O metrics using SET STATISTICS IO ON;.)
The SELECT has a warning icon, because the optimizer recommends an index for this query:
USE [AdventureWorks2012] ; GO CREATE NONCLUSTERED INDEX [ <Name of Missing Alphabetize , sysname ,> ] ON [Sales] . [SalesOrderHeader] ( [CustomerID] ) INCLUDE ( [OrderDate] , [ShipDate] , [SalesPersonID] , [SubTotal] ) ; Test i
We will commencement create the alphabetize the optimizer recommends (named NCI1_included), likewise equally the variation with all the columns every bit fundamental columns (named NCI1):
CREATE NONCLUSTERED INDEX [NCI1] ON [Sales] . [SalesOrderHeader] ( [CustomerID] , [SubTotal] , [OrderDate] , [ShipDate] , [SalesPersonID] ) ; GO CREATE NONCLUSTERED Alphabetize [NCI1_included] ON [Sales] . [SalesOrderHeader] ( [CustomerID] ) INCLUDE ( [SubTotal] , [OrderDate] , [ShipDate] , [SalesPersonID] ) ; GO If we re-run the original query, once hinting it with NCI1, and in one case hinting information technology with NCI1_included, we meet a plan similar to the original, but this fourth dimension at that place's an index seek of each nonclustered index, with equivalent values for Table I/O, and similar costs (both about 0.006):
Original query with index seeks – key on the left, include on the right
(The browse count is still 1 because the alphabetize seek is actually a range scan in disguise.)
Now, the AdventureWorks2012 database isn't representative of a production database in terms of size, and if nosotros look at the number of pages in each index, we see they're exactly the same:
SELECT [Table] = N'SalesOrderHeader' , [Index_ID] = [ps] . [index_id] , [Index] = [i] . [proper name] , [ps] . [used_page_count] , [ps] . [row_count] FROM [ sys ] . [ dm_db_partition_stats ] Every bit [ps] INNER Join [ sys ] . [ indexes ] Every bit [i] ON [ps] . [index_id] = [i] . [index_id] AND [ps] . [object_id] = [i] . [object_id] WHERE [ps] . [object_id] = OBJECT_ID ( N'Sales.SalesOrderHeader' ) ;
Size of indexes on Sales.SalesOrderHeader
If we're looking at performance, it's ideal (and more fun) to test with a larger data set.
Exam ii
I have a copy of the AdventureWorks2012 database that has a SalesOrderHeader table with over 200 million rows (script Hither), and so allow'due south create the aforementioned nonclustered indexes in that database and re-run the queries:
USE [AdventureWorks2012_Big] ; Go CREATE NONCLUSTERED INDEX [Big_NCI1] ON [Sales] . [Big_SalesOrderHeader] (CustomerID, SubTotal, OrderDate, ShipDate, SalesPersonID) ; GO CREATE NONCLUSTERED Index [Big_NCI1_included] ON [Sales] . [Big_SalesOrderHeader] (CustomerID) INCLUDE (SubTotal, OrderDate, ShipDate, SalesPersonID) ; Go SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( Twenty-four hours , [OrderDate] , [ShipDate] ) Equally [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( INDEX (Big_NCI1) ) WHERE [CustomerID] betwixt 11000 and 11200 ; SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( Solar day , [OrderDate] , [ShipDate] ) AS [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( INDEX (Big_NCI1_included) ) WHERE [CustomerID] between 11000 and 11200 ;
Original query with alphabetize seeks confronting Big_NCI1 (fifty) and Big_NCI1_Included (r)
At present nosotros become some data. The query returns over 6 million rows, and seeking each index requires just over 32,000 reads, and the estimated cost is the same for both queries (31.233). No functioning differences yet, and if we bank check the size of the indexes, we see that the index with the included columns has 5,578 fewer pages:
SELECT [Table] = N'Big_SalesOrderHeader' , [Index_ID] = [ps] . [index_id] , [Alphabetize] = [i] . [name] , [ps] . [used_page_count] , [ps] . [row_count] FROM [ sys ] . [ dm_db_partition_stats ] Equally [ps] INNER JOIN [ sys ] . [ indexes ] Equally [i] ON [ps] . [index_id] = [i] . [index_id] AND [ps] . [object_id] = [i] . [object_id] WHERE [ps] . [object_id] = OBJECT_ID ( N'Sales.Big_SalesOrderHeader' ) ;
Size of indexes on Sales.Big_SalesOrderHeader
If we dig into this a big farther and cheque dm_dm_index_physical_stats, we can meet that difference exists in the intermediate levels of the index:
SELECT [ps] . [index_id] , [Index] = [i] . [proper name] , [ps] . [index_type_desc] , [ps] . [index_depth] , [ps] . [index_level] , [ps] . [page_count] , [ps] . [record_count] FROM [ sys ] . [ dm_db_index_physical_stats ] ( DB_ID ( ) , OBJECT_ID ( 'Sales.Big_SalesOrderHeader' ) , v , Zip , 'DETAILED' ) Every bit [ps] INNER Join [ sys ] . [ indexes ] AS [i] ON [ps] . [index_id] = [i] . [index_id] AND [ps] . [object_id] = [i] . [object_id] ; SELECT [ps] . [index_id] , [Index] = [i] . [proper name] , [ps] . [index_type_desc] , [ps] . [index_depth] , [ps] . [index_level] , [ps] . [page_count] , [ps] . [record_count] FROM [ sys ] . [ dm_db_index_physical_stats ] ( DB_ID ( ) , OBJECT_ID ( 'Sales.Big_SalesOrderHeader' ) , 6 , Zilch , 'DETAILED' ) Equally [ps] INNER Join [ sys ] . [ indexes ] [i] ON [ps] . [index_id] = [i] . [index_id] AND [ps] . [object_id] = [i] . [object_id] ;
Size of indexes (level-specific) on Sales.Big_SalesOrderHeader
The divergence between the intermediate levels of the two indexes is 43 MB, which may not be pregnant, simply I'd probably still be inclined to create the index with included columns to save space – both on disk and in memory. From a query perspective, we all the same don't see a big change in operation between the index with all the columns in the key and the index with the included columns.
Test 3
For this test, let's change the query and add a filter for [SubTotal] >= 100 to the WHERE clause:
SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( DAY , [OrderDate] , [ShipDate] ) Equally [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( Index (Big_NCI1) ) WHERE CustomerID = 11091 AND [SubTotal] >= 100 ; SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( Twenty-four hour period , [OrderDate] , [ShipDate] ) AS [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( INDEX (Big_NCI1_included) ) WHERE CustomerID = 11091 AND [SubTotal] >= 100 ;
Execution program of query with SubTotal predicate against both indexes
Now we see a difference in I/O (95 reads versus 1,560), cost (0.848 vs one.55), and a subtle just noteworthy departure in the query plan. When using the index with all the columns in the key, the seek predicate is the CustomerID and the SubTotal:
Seek predicate confronting NCI1
Because SubTotal is the second column in the index key, the information is ordered and the SubTotal exists in the intermediate levels of the alphabetize. The engine is able to seek directly to the starting time record with a CustomerID of 11091 and SubTotal greater than or equal to 100, and then read through the index until no more than records for CustomerID 11091 exist.
For the index with the included columns, the SubTotal only exists in the leaf level of the index, so CustomerID is the seek predicate, and SubTotal is a balance predicate (just listed equally Predicate in the screen shot):
Seek predicate and residual predicate confronting NCI1_included
The engine tin seek directly to the offset tape where CustomerID is 11091, but then information technology has to look at every record for CustomerID 11091 to run into if the SubTotal is 100 or higher, considering the information is ordered by CustomerID and SalesOrderID (clustering key).
Exam 4
We'll attempt one more variation of our query, and this fourth dimension we'll add an ORDER Past:
SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( Twenty-four hours , [OrderDate] , [ShipDate] ) As [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( INDEX (Big_NCI1) ) WHERE CustomerID = 11091 Order Past [SubTotal] ; SELECT [CustomerID] , [SalesPersonID] , [SalesOrderID] , DATEDIFF ( DAY , [OrderDate] , [ShipDate] ) AS [DaysToShip] , [SubTotal] FROM [Sales] . [Big_SalesOrderHeader] WITH ( INDEX (Big_NCI1_included) ) WHERE CustomerID = 11091 ORDER BY [SubTotal] ;
Execution plan of query with SORT against both indexes
Again nosotros have a alter in I/O (though very slight), a change in price (1.5 vs 9.3), and much larger change in the plan shape; we likewise see a larger number of scans (1 vs 9). The query requires the data to be sorted by SubTotal; when SubTotal is part of the alphabetize central information technology is sorted, so when the records for CustomerID 11091 are retrieved, they are already in the requested order.
When SubTotal exists as an included column, the records for CustomerID 11091 must be sorted before they can be returned to the user, therefore the optimizer interjects a Sort operator in the query. As a issue, the query that uses the index Big_NCI1_included also requests (and is given) a memory grant of 29,312 KB, which is notable (and constitute in the properties of the plan).
Summary
The original question we wanted to answer was whether we would see a functioning difference when a query used the index with all columns in the key, versus the index with most of the columns included in the foliage level. In our offset set of tests there was no departure, but in our third and 4th tests there was. It ultimately depends on the query. We only looked at 2 variations – one had an additional predicate, the other had an ORDER BY – many more exist.
What developers and DBAs demand to understand is that at that place are some slap-up benefits to including columns in an index, but they will non always perform the same every bit indexes that have all columns in the key. It may be tempting to movement columns that are not part of predicates and joins out of the cardinal, and just include them, to reduce the overall size of the index. Still, in some cases this requires more resources for query execution and may degrade performance. The degradation may be insignificant; it may not be…you volition non know until you exam. Therefore, when designing an index, information technology'southward important to recollect about the columns after the leading one – and understand whether they need to exist office of the key (e.thousand. because keeping the data ordered will provide do good) or if they can serve their purpose as included columns.
As is typical with indexing in SQL Server, y'all have to exam your queries with your indexes to make up one's mind the best strategy. It remains an fine art and a science – trying to find the minimum number of indexes to satisfy as many queries as possible.
Source: https://sqlperformance.com/2014/07/sql-indexes/new-index-columns-key-vs-include
0 Response to "All of the Following Determine How Many Columns Can Be Included on a Page Except"
ارسال یک نظر