SQL Server generates histogram statistics automatically whenever a column is referenced in a predicate and it is not already the leading column in an existing histogram statistic. By design, indexes create histogram statistics automatically so any column you have that is the leading column in an index will already have histogram statistics.
The autogenerated statistics are prefixed _WA_Sys_ followed by:
- the column name (in older SQL Server versions) or the column number in hex (for SQL Server 2008 onwards) e.g. EMPLID or 00000001
- the object_id of the table in hex e.g. 57490C28
separated by an underscore. So auto-generated statistics look like this:
- _WA_Sys_DESCR_57490C28 or _WA_Sys_00000013_57490C28
The issue for me as a PeopleSoft Administrator/DBA is whether I should keep these statistics or recreate them as “user” statistics with more sensible names. Oddly enough the “old style” naming convention which included the column name was actually better than the “hex” version we have now. After all what is column 00000014 (decimal 20) in table X?
My personal preference is to process the system generated statistics and recreate them as statistic “column name” on table X. I have a perl script that generates the relevant DROP/CREATE STATISTICS commands for an entire PeopleSoft database. The output looks like this:
--- Remove statistic _WA_Sys_AUDITRECNAME_5921A398 for column AUDITRECNAME on table PSRECDEFN
DROP STATISTICS [dbo].[PSRECDEFN].[_WA_Sys_AUDITRECNAME_5921A398];
--- Create User Statistic for column AUDITRECNAME (+ 1 keys) on table PSRECDEFN
CREATE STATISTICS [AUDITRECNAME] ON [dbo].[PSRECDEFN] ([AUDITRECNAME],[RECNAME]) WITH FULLSCAN; |
--- Remove statistic _WA_Sys_AUDITRECNAME_5921A398 for column AUDITRECNAME on table PSRECDEFN
DROP STATISTICS [dbo].[PSRECDEFN].[_WA_Sys_AUDITRECNAME_5921A398];
--- Create User Statistic for column AUDITRECNAME (+ 1 keys) on table PSRECDEFN
CREATE STATISTICS [AUDITRECNAME] ON [dbo].[PSRECDEFN] ([AUDITRECNAME],[RECNAME]) WITH FULLSCAN;
Why do I like to do this? Well there are a number of reasons:
- The naming convention is clearer to me – open the statistics tree within a table in object explorer and you get a list of column names plus the statistics created for any indexes.
- Once I have changed to “user” statistics I can query for any new _WA_Sys_ statistics (yes I leave auto-generate switched on). This shows me any “new” query activity that accesses via previously unused predicate columns. This can help in deciding if perhaps new indexes are required.
- I can ensure the existing statistic are recreated using a “WITH FULLSCAN” not just sampled data. For many PeopleSoft systems this is essential – data can often be very skewed and sampling can miss this on occasion.
- I like to add in the “key” columns of the table to the histogram to provide better density information for the optimizer. This can be essential information when intermediate key fields have a single value e.g. EMPL_RCD in HR is typically 0 in most cases.
If you would like to discuss this in more detail, leave a comment.