What Changed?
Building the Fast Track to the Answers When Troubleshooting Microsoft SQL Server Reliability and Performance Problems
For a Microsoft SQL Server 2000/2005/2008 or 2012 database engine instance, the configuration metadata can be taken to include the collection of knobs, switches, settings, Data Definition Language (DDL) scripts necessary to exactly recreate that SQL instance. When unexpected database problems occur, regardless if on a production server or a unit test desktop computer, quick and easy access to the information about database configuration changes coincidental to the problem are invaluable to the troubleshooting or forward recovery effort. In fact, just having the previous configuration readily available to effect a rollback can be the difference between meeting service level expectations or not. Once a crisis is stabalized, the ability to identify configuration changes over time is useful in understanding what went wrong and, more importantly, in preventing further episodes of the same problems. For development shops and some database server farms, the ability to compare configurations between SQL Instances is indispensible to understanding - or better yet preventing - a change related system degradation or outage.
Almost every DBA has an arsenal of tools that help in some way with this all to common scenario. Unfortunately, many of these tools are expensive for the server. Many require that the DBA be connected to the faltering system and throwing some pretty heavy duty queries at the system. Furthermore, allowing non-DBA staff ad hoc research and comparison access on live SQL Instances can not only impact performance but runs a much higher risk to make matters worse. There is also an increased likelihood that the unexpected - and it will almost never be a desirable side effect - will happen. For example, if a server is running stressed, say at 65%+ CPU for extended periods for a dedicated SQL Server host, the likelihood that any administrative or analytical query activity can impact performance is greater than if the CPU hovers around 20% or less most of time. Even if the problem is not specifically performance related, the research can cause blocking and unintended failure of critical application operation at the database.
The DBAs troubleshooting arsenal is tested to some degree. The dedicated admin connection of SQL Server 2005 and later is also helpful. Analysts and developers on the other hand tend to follow a linear trouble shooting path based more on thier knowledge of the application than the implications of their actions on the SQL Server under inspection. For this reason, especially if database reliability and performance are important for the organization, internal analysis should not require direct analyst or developer access to live systems. Instead, monitoring for change that is presented as standardized configuration and performance metadata, sharing of knowledge, and asynchronous dissemination of information across the teams developing and supporting the SQL Server application is preferred over allowing everyone direct and undiciplined accesss to the information they need to properly support the application. It is equally important to protect the live database and at the same time protect the live system from unnecessary ad hoc access.
Typically, the development team will have a master source reference stored in a source control database such as CVS, Sourcesafe or Surround. This master copy goes a long way toward providing much of the information a developer needs to recreate a problem in a one-off environment. However, that can be a time consuming operation. Futher, the accuracy of the "master" source for SQL Server database object scripts is not guaranteed to be accurate. Many developers are painfuly aware of this disconnet between the master source and the actual database. Consequently hey prefer to get scripts directly off the live system when starting a maintenance development iteration rather than rely on what is in the master souce. Why SQL Server has no integrated tools to correct this fundamental flaw remains unexplain.
From the DBA's perspective, all that matters is that the configuration on a live system or configuration extracted by directly monitoring for change in live systems is not to be confused with the master source of the code base or a suitable source control system in a development shop. Instead, the output of the change monitoring ought to be used to regularly verify and invariably re-sync the so called master source with what actually exists on the live database server.
Isolating "Database Problems"
It is always interesting what business folks will lable as a database problem. Most of the time non-DBA assessments that conclude that there is a database problem will not be database problems. Instead they will be application or infrastructure problems that manifest as dproblems getting data from the database. We will save that topic for another day. Here lets assume that we have already worked through the preliminary steps to know that a problem is in one that must be resolved by the DBA within the database.
One of the more common scenarios for a run time system glitch is during – or more classically, shortly after - a planned change to the system. The typically brief lag is because in most cases, where changes are applied to a running live system, the application code is changed after the database changes are made. Especially in those shops where too many people have the ability to make changes to the production environment, the need to be able to identify who-did-what-when is often a key to finding the shortest path to resolving the problem, even if only to eliminate the most likely causes first. Unfortunately, the ability to satisfy this need is not as frequently found as it is needed. Keys to quick problem identification and to future avoidance tactics include the ability to quickly identify:
A good monitoring strategy will assure that the necessary ongoing data collection happens and that the data remains available to the designated audience for an appropriate time. Also, having automation in place to capture the information necessary to revert to a working state is important for quick and easy troubleshooting and problem resolution - especially so in the absence of release based change protocols that mandate tested rollback processes.
Production Environments
Every Database Administrator (DBA) spends more time fighting fires in the production environment than is comfortable. It comes with the territory. Much of the time and energy invested in defining and building monitoring systems for the run time environment are in defense of previous encounters with known digital arsonists. Unfortunately, and in spite of the evidence pointing overwhelmingly toward the need to do so, the ability to identify what has changed is seldom adequately addressed in the IT infrastructure. There are several reasons for this. Foremost, it is a hard problem. The difficulty of the task, the resource requirements to create and manage the monitors, and in some cases convincing the organization of the value of even the most obvious and urgent issues caused by change is difficult because the business has some expectation that things over in IT break when change happens. There is little doubt that every IT organization using SQL Server, working within the bounds of past experience to meet the demand of current requirements, cannot meet the challenge and reap the reliability and maintainability benefits of monitoring SQL Server configuration change.
Pre-Production Test Environments
Being able to quickly confirm that only expected differences exist between a test environment and a run time environment is often essential for effective testing and is helpful to confirm the database change list exposed in the testing. In scenarios where the introduction of change is meticulously serialized and therefore more easily managed, as in an quality assurance regression testing or pre-production acceptance environments, comparison capabilities between Instances are very useful and usually quite straight forward ti implement. These same software lifecycle environments often present security barriers that limit such comparisons. On the other hand in parallel path development environments and in particular in software lifecycle environments where change scripts are developed and tested, the abiliy to identify the delta versus a downstream live environment is equally as critical but many times more difficult to achieve. Tools that produce accutate difference reports veru quickly are essential.
Development Environments
In almost all development scenarios change to the environment is chaotic. Having an accurate record of configuration and DDL settings on development SQL Servers is essential to
assure that the environment remains as similar to live environments as practical
identify and verify change requirements from development upstream in the software lifecycle
reduce the possibility for lost work
Being able to confirm any SQL Server Instance’s configuration against another SQL Instance and a master catalog is essential.
Development environments often require periodic data refreshing either directly for or as an accurate simulation of current production data. A typical development data refresh scenario in insipient applications is to restore the run time or production database, apply pending changes that are already implemented in production and move on. Usually it works OK, always it is disruptive, occasionally there are undetected problems due to configuration variance. Being able to quickly identify the differences between the run time environment and the development environment is obviously helpful in such efforts.
In addition, the exercise will usually help to eliminate the direct use of production data in development. This could even make the security officer smile.
Unit Test Environments
One especially undesirable occurrence for a developer is loss of work. Power fails, hardware fails or for other reasons computers do stop responding and data, even metadata can be lost if a hard reset is necessary. Computers used in development environments are notorious for such failures. When it happens, a day’s work, even a weeks work can easily be lost in spite of good source control practices. Capturing all the versions made during the work day automatically and asynchronously to a local archive – in a way that will not impact the developers work, but as the change happens can help prevent such lost work.
An ideal Unit Test scenario would preserve the changes throughout the day. The developer would need to take a small amount of time to occasionally purge the automatically created history instead of the much more painful alternative of having to recreate a previous version for a table or stored procedure. The previous work would remain available for review and reuse. In this local system unit test scenario there would be little need to save the metadata history for long periods of time. Many shops expect at least daily check-in to source control for all work in progress. Once the development cycle is completed, or as otherwise safe and necessary, the local archive can be purged as desired. The SQL activity on the unit test system is often such that a complete history of the TSQL changes on the SQL Instance would not require massive amounts of storage. The storage requirement would typically be easily accommodated within the disk storage allocated to any properly sized SQL Server Instance.
Post –Production Environments
Data quality is of utmost importance for Data Warehousing, BI and data mining data stores. The metadata and configuration for post production SQL Servers must not vary for this reason alone. The requirements to monitor metadata and configuration change are much like those for live systems. In some shops, post-production SQL Server databases are also likely to use a separate and different software lifecycle development environment than the production development team. Having an ability to easily compare metadata and configuration between the different development software lifecycle regions would be a front line defense against the tendency for the regions to drift out of sync. For the BI folks, there are few things more disruptive than unexpected source system changes that break the ETL process.
Centralized Administration
Making sure that everyone is working from a common base is essential to effective database administration and also enables the highest quality development and the least disruptive change process possible for the organization. The common base requires an ability to apply role based security.
SQL Server DDL Events
Fortunately, beginning with Microsoft’s SQL Server 2005 the ability to identify change includes some new helpers. One small but welcome example is the finally renewed fervor to populate the forever senseless refdate column in dbo.sysobjects (albeit in a back door fashion by morphing it from a view of the modify_date column in sys.objects). Prior to SQL Server 2005 ALTER statements (ALTER TABLE, ALTER PROCEDURE, etc) did not update the catalog even with the date. There was always this totally bogus column in dbo.sysobjects pretending to pronounce the date and time the object was last changed. Now at least the column is correct. There is also now some metadata change information in the default trace and improved trace and event audit capabilities and even enhancements to the built-in audit features including Login Auditing, Common Criteria Compliance (after SP2) and C2 auditing. Other easily accessible examples are DDL triggers, DDL event notifications and Trace event notifications, the DMVs and extended events. These event based mechanisms provide a means to capture a fairly complete picture of metadata changes as it happens.
All that is left for the DBA is to develop a robust, reliable and low maintenance way to store and use this wealth of new data. It sounds simple: slap a DDL trigger on a server that writes change events to a table. The DBA need only keep a query handy that will show the changes for the last hour or day and Viola! A quick tool to see what changed is at hand. I agree. If you are running SQL Server 2005 there is no excuse not to have at least this rudimentary protection. Having the trigger is better than not having the trigger, unfortunately there are a few of the limitations to this scenario:
Finding time and budget to develop and test the trigger code
May miss changes that don’t raise trappable events (e.g. encryption hierarchies) or at least don't fire the DDL trigger (e.g. permissions)
Multiple SQL Server instances adding to the complexity and time necessary for looking in all place
Resource requirements and management such as storage and security
Time requirements to analyse and build the reporting and maintenance tools necessary for optimal use of the data collected
Rollback/Recovery support to identify not only what changed but also what it looked like before the change
Undoubtedly you can think of more.
Obviously, SQL Server’s new found ability to announce change comes at some cost. Far and away, the biggest cost is the additional demands on the already over extended time of the DBA, operations and development staff to learn about, develop, test and implement these new capabilities. There is also the reality that others may not know how to use change identification tools if the DBA is not present. And there is some uncertainty that all changes are identified or that the approach taken may adversely affect the application that the monitoring is intended to protect.
After some deliberation and several years of hands on experience with the problem of identifying SQL Server metadata and configuration changes I wrote Bill Wunder’s DDL Archive Utility to aid administrators of SQL Server 2007 and 2000 installations in monitoring those changes.
Many SQL Server DBAs have downloaded this DDL Archive Utility from SQL Server Central (http://www.SQLServerCentral.com), The SQL Server World Wide User Group (http://www.sswug.org ) and from my web pages (http://www.nyx.net/~bwunder) to automatically capture a daily history of changes to the stored procedures, tables and other database objects found on targeted SQL Server 6.5, SQL Server 7, SQL Server 2000 and even SQL Server 2005 and SQL Server 2008 SQL Server instances. The wide and ongoing interest in that tool speaks to its necessity in effective database administration.
Over time, and in lock step with the improved capabilities of the Visual Studio IDE, the DDL Archive utility evolved from a COM/API based command line batch utility into a robust and more configurable .NET application with console capabilities.
For better or worse, the DDL Archive Utility stored its change history in Visual SourceSafe.
While it has proven to be a useful and reliable tool, there are limitations to the DDL Archive Utilities value beginning with SQL Server 2005. The Archive utility is based on Microsoft’s Database Management Objects (DMO) database administrative programming interface for SQL Server. DMO is a COM technology that does not work with configuration new with SQL Server 2005 such as Service Broker, the encryption hierarchy, http endpoints, the xml data type and of course .NET assemblies. We must use SQL Management Objects (SMO) - Microsoft’s replacement for DMO - to gain adequate access to the metadata that is new with SQL Server 2005. SMO works well with SQL Server 2005, fully exposes the SQL Server 2000 database engine metadata, and is also the administrative programming interface for SQL Server 2008.
To begin with, let’s face facts. Some of the permissions and access rights DBAs need to monitor are risky from the perspective of today’s more security conscious systems. And some of the data exposed during such monitoring is sensitive and valuable property of the organization.
The DDL Archive Utility installation has several dependencies including Visual SourceSafe - via a COM Interop, SQLXML, a relatively large file system staging area, and a licensed instance of SQL Server or some other scheduling mechanism. The DDL Archive Utility is also by necessity a somewhat resource intensive and therefore inefficient process that scripts every object on the database instance and then compares each to the latest version in the SourceSafe repository each time the schedule archive is executed. Since most database objects don’t change this requires a relatively large amount of CPU cycles and network bandwidth to identify the few changes that may have happened since the last time a check for changes occurred.
The DDL Archive Utility employs Distributed Management Objects (DMO) scripting technology to produce the files that are compared to the SourceSafe repository. SQLDMO is not aware of any database features new in SQL Server 2005. Consequently it cannot completely archive a SQL Server 2005 instance.
After much more deliberation and several more years of experience with the needs and capabilities for monitoring change, I have created an upgrade path for all those folks using the DDL Archive Utility that wish to enjoy the same level of safety and security with SQL Server 2005 and SQL Server 2008 that they have realized with the Archive Utility on previous SQL Server releases. Furthermore, DBAs with no prior experience with the DDL Archive utility can easily begin monitoring for changes on any SQL Server 2000, SQL Server 2005 or SQL Server 2008 database server instance with the newly architected tool: SQLClue
SQLClue is adaptive, offering a suite of tabbed functions that can be enabled and disabled as desired. Each function is defaulted to a recommended configuration yet each component is configurable to match different requirements. Each component adapts to the environment as needed. Regardless of the SQLClue components used, monitoring with SQLClue adds only a very small administrative time cost and system performance impact.
To install and use SQLClue the prerequisite are the .NET Framework 3.5 and SQL Server 2005 or SQL Server 2008 running on the host server. SQLClue uses this host SQL Server as the SQL Configuration Repository for the archive of changes instead of Visual SourceSafe. The SQL Server can be any edition including SQL Server Express Edition. This means there need be no implicit additional licensing costs to use SQLClue. SQLClue relies on SQL Server Management Objects (SMO) to document the metadata changes. The changes documented provide an archive that is unprecedented in its completeness. For scriptable objects, a script is stored in the repository. For metadata that is not scriptable, the SMO properties are extracted from the server and stored as name-value property listing. Asynchronous Service Broker based Event Notifications are used to capture the vast majority of likely changes as they occur. Then at a configurable schedule interval, the events are processed into the repository. Those few metadata types that do not generate an event when changed are still routinely documented and compared one at a time to the latest version in the repository. However, the overall CPU and network performance load is a small fraction of that required y the DDL Archive Utility because the object types that change the most do generate events. At any time it is possible to peek into the local Event Notification Service Broker Queue to see event triggering changes since the last archive to the repository. In fact, at the discretion of the user, the Service Broker Queues can be con
SQLClue includes a richly user configurable comparison tool with Regular Expression support for object name matching and documentation line level filtering. The possibilities are nearly unlimited.
The DDL Archive Utility provided a tool to Script database object DDL and exploits the add, check-in , version compression algorithm and security model of Visual Source Safe to maintain a versioned history of DMO database object scripts. SQLClue includes the SQL Configuration Archive that replaces this functionality and expands it to include CLR, Service Broker and other SQL Server 2005 database objects. The SQL Configuration Archive also monitors non-scriptable database metadata for change. SQLClue goes further by providing a framework for creating a Database Monitoring Strategy using secure and searchable SQL Server based Data Center Runbook of peer reviewed administration and operation protocols, practices, and procedures to facilitate and manage the sharing of database related knowledge and information and assure business continuity accross the data layer.
In summary, there really is little comparison between SQLClue and Bill Wunder’s DDL Archive Utility. While SQLClue does everything that the Archive Utility could do for SQL Server 2000, SQLClue provides features and functions for the newer generations of SQL Server. The DDL Archive Utility had one function and many external dependencies. SQLClue has multiple functions – part of one function replacing what the DDL Archive Utility – and no external application dependencies other than SQL Server.
Try SQLClue. I am quite certain you will find it to be an indespensable tool with a myriad of uses. I invite you to download the freely licensed SQLClue Workstation. Included in the download are upgrades to full versions of the SQL Configuration Archive, Data Center Runbook and Encryption Hieararchy Backup & Restore components that require a local SQL Server 2012 instance.