Statistics are very important object in the SQL
Server which is used for query optimization to define the query execution plans.
They contain statistical information about the distribution of values in one or
more columns of a table.
The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator.
With the help of statistics information, the optimizer would know the selectivity for the data.
The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator.
With the help of statistics information, the optimizer would know the selectivity for the data.
Whenever
statistics are created against any table then Database Engine sorts the values
of the columns on which the statistics are being built. The query optimizer
computes a histogram on the column values in the first key column of the
statistics object. A histogram measures the frequency of occurrence for each distinct
value in a data set. Histogram has limitation of 200 steps, separated by
intervals.
What
Are Statistics Objects?
Each statistics object is
created on a list of one or more table columns which includes a histogram
displaying the distribution of values in the first column. Statistics objects
on multiple columns also store statistical information about the correlation of
values among the columns. These correlation statistics, or densities, are
derived from the number of distinct rows of column values.
In a very simple manner, we can understand that histogram is nothing
but it is something which specifies how many rows exactly match each interval
value, how many rows fall within an interval, and a calculation of the density
of values, or the incidence of duplicate values, within an interval.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.
Multicolumn statistics – Just what they sound like, statistics over multiple columns. These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics – We can also put a filter on our statistics. This is useful when we have a very large table but normally only query on a small subset of rows. If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?
The Different way statistics being created
References: https://msdn.microsoft.com/en-gb/library/ms174384.aspx
Multicolumn statistics – Just what they sound like, statistics over multiple columns. These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics – We can also put a filter on our statistics. This is useful when we have a very large table but normally only query on a small subset of rows. If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?
Filtered statistics are
nothing but they play a very important role to improve query performance for
SQL data queries that select from well-defined subsets of data. Filtered
statistics use a filter predicate to select the subset of data that is included
in the statistics. Well-designed filtered statistics can improve the query
execution plan compared with full-table statistics.
SQL Server is responsible
to maintain statistical information about key value distribution for all b-tree
indexes so that it can more accurately estimate the number of qualifying rows.
This information is used by the cost-based optimizer to generate the most
efficient execution plan for the query, and aid in decisions about whether to
use or not use an index for a particular query.
The most important thing
is that Statistics do not have the b-tree structures to facilitate locating the
data. However, they do provide cardinality statistics that the optimizer can
use to develop more accurate row count estimates, resulting in more efficient
query plans. Cardinality estimator is only interested in predicates.
How
to view Statistics within Data table?
DBCC SHOW_STATISTICS
displays current query optimization statistics for a table or indexed view. The syntax lets you specify a table or indexed
view along with a target index name, statistics name, or column name:
DBCC SHOW_STATISTICS
( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ]
]
< option
> :: =
STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM
Arguments
table_or_indexed_view_name: Name of the table or indexed view for which to display statistics
information. Name of the table that contains the statistics to display. The
table cannot be an external table.
Target: Name of
the index, statistics, or column for which to display statistics information.
Target is enclosed in brackets, single quotes, double quotes, or no quotes.
If target is a name of an existing index or statistics on a table or indexed
view, the statistics information about this target is returned
NO_INFOMSGS: Suppresses
all informational messages that have severity levels from 0 through 10.
STATS_STREAM is
Identified for informational purposes only. Not supported. Future
compatibility is not guaranteed.
|
NOTE: If
target is the name of an existing column, and an automatically created
statistics on this column exists, information about that auto-created statistic
is returned. If an automatically created statistic does not exist for a column
target, error message 2767 is returned. In SQL Data Warehouse and Parallel Data
Warehouse, target cannot be a column name.
Example:
DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]);
DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]) WITH STATS_STREAM;
|
The Different way statistics being created
There are many ways to create statistics on the indexes on tables. Some of them are given below:
- The query optimizer creates statistics on key column for indexes on tables when the index is created.
- The query optimizer creates statistics for single columns in query predicates.
- Composite indexes creates Multi column statistics.
- sp_createstats stored procedure.
- Create Statistics Statement.
References: https://msdn.microsoft.com/en-gb/library/ms174384.aspx
Controls of Mathematics, Statistics, Computer science, and Information innovation adds to their speculations and systems in the foundation of the field of Data Science. ExcelR Data Science Courses
ReplyDeleteYour music is amazing. You have some very talented artists. I wish you the best of success. 数学家教
ReplyDelete