In SQL Server 2012 and earlier, the join selectivity (as a whole) is estimated using the SalesOrderID histograms from each table (computed for each histogram step, after step boundary alignment using linear interpolation as necessary). It remains the same whether the join is ultimately performed using a Nested Loops, Hash, or Merge Join physical operator. Note that the (logical) join selectivity is independent of the choice of physical join operator. The result, using single-precision floating-point arithmetic, is 11.4867. The 11.4867 number is derived (for display in showplan) by dividing the computed estimated cardinality of the join output (30.0919) by the number of iterations (2.61972). Using the SQL Server 2012 cardinality estimator, the selectivity of the join drives the estimated number of rows on the inner side of the nested loops join, and not the other way around. I think my understanding of the second estimate is incorrect and the differing numbers seems to indicate that.
So if the estimated number of rows to be loop through is 2.61972, and the average to be returned in 3.85561, the I would think the estimated number of rows would be 2.61972 * 3.85561 = 10.10062.īut the estimated number of rows is 11.4867. If there are 31465 unique SalesOrderID's in the SalesOrderDetail, then with an even distribution, the average number of rows per SalesOrderID is 121317 (total number of rows) divided by 31465. That means that 1/3.178134E-05 (31465) equals the number of unique SalesOrderID values in the SalesOrderDetail table. So the density of the SalesOrderID (which I am joining on) is 3.178134E-05. If I run DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID'): The part I don't understand is the estimated number of rows for the clustered index seek against the SalesOrderDetail table. The average number of distinct rows between 1101 is 2.619718, or rounded to 2.61972 which is the value of estimated rows shown for the index seek. If I use DBCC SHOW_STATISTICS ('Sales.SalesOrderHeader', 'IX_SalesOrderHeader_CustomerID') WITH HISTOGRAM, it shows that the value 11077 is between the two sampled keys 1101. The initial index seek (top right) is using the IX_SalesOrderHeader_CustomerID index and searching on the literal 11077. If I look at the estimated execution plan, I see the following:
I am running this query in the AdventureWorks2012 database: SELECT