MERGE – An UPSERT Example

I recently had to do an UPSERT. (Yes, it’s a real word: UPdate + inSERT.) This would normally be an UPDATE followed by in INSERT wrapped in a transaction so if either fails it all rolls back. This means you will make at least two passes through the data, locking it until the transaction commits. Logically this is a single operation, wouldn’t it be nice to have a single construct that could handle the UPDATE, the INSERT, and maybe even a DELETE if necessary? That’s exactly what we have in the MERGE statement. SQL Books Online gives a good explanation of the syntax and a few examples but it can get very complex very quickly.

USE tempdb
GO
    IF (OBJECT_ID('dbo.#Target')) IS NOT NULL 
        DROP TABLE #Target; 
        
    IF (OBJECT_ID('dbo.#Source')) IS NOT NULL 
        DROP TABLE #Source;        

    --Dupsert this table...
        CREATE TABLE #Target 
            (
                TargetID    INTEGER    IDENTITY PRIMARY KEY NOT NULL
                ,DataCol    CHAR(10)    NOT NULL
            );
        INSERT INTO #Target (DataCol) 
            VALUES ('Static'),('Update Me')
            ,('Delete Me')
            ;
        
    --...with changes from this one.
        CREATE TABLE #Source 
            (
                SourceID    INTEGER    IDENTITY(101,1) PRIMARY KEY NOT NULL
                ,TargetID    INTEGER    NULL 
                ,DataCol    CHAR(10)    NOT NULL
            );
        INSERT INTO #Source (DataCol,TargetID) 
            VALUES ('Static',1),('Change',2),('New',NULL);
        
    --View tables    
        SELECT * FROM #Target;
        SELECT * FROM #Source;

    --Dupsert the target using MERGE
        MERGE 
            INTO #Target 
            USING #Source
                ON    #Target.TargetID = #Source.TargetID
            WHEN MATCHED THEN UPDATE 
                SET #Target.DataCol = #Source.DataCol
            WHEN NOT MATCHED BY TARGET THEN 
                INSERT (DataCol)
                VALUES (#Source.DataCol)
            WHEN NOT MATCHED BY SOURCE THEN 
                DELETE;

    --View target    
        SELECT * FROM #Target;

As you can see, the code is a logical unit, straightforward, and flexible.  Enjoy!

Performance Tuning – An Example

Recently we had some trouble with a stored procedure taking between 1.0 and 1.5 seconds with occasional runs of 8 or 9 seconds. This procedure is widely used by applications contacting a widely used database and is called as often as once or twice a second. It returns six recordsets each with a total row count of 50 or less.

This stored procedure is moderately complex and structured as follows: temporary data structures are created and populated by successive calls to the database and then the data is modified slightly and returned. The procedure accepts up to 16 input parameters with one or two “signatures” predominant. For example, looking up records by Customer_ID was the most common use for the procedure but it was also possible to do lookups based on a set of hashed TINS, first and last name, etc. If null values were passed in for the parameters the procedure still proceeded in a top-down fashion without any logic to skip queries that were unnecessary. Each DB call involved joining on a dozen code tables using the NOLOCK query hint.

The Query Plan

We began by examining the query plan. This is a representation of the physical steps SQL Server uses to provide the data you request in a query. Often a DBA can use this plan to discover where SQL Server is spending a lot of time and take action to correct it. This plan can be expensive to create and so once calculated it is stored in SQL Server’s plan cache until manually cleared or SQL Server is restarted. As long as the plan is calculated correctly this is a great efficiency however the input parameters can have a significant effect on the logic of the query and consequently the cost. Suppose the plan was first calculated based on a list of hashed TINS and stored in the plan cache. If the next call is based on Customer_ID the first query plan will be used even though it may not be optimal for the new input parameters. This is known as a Parameter Sniffing problem and is discussed in more detail here.

We believed this was part of the reason for the poor performance. There are at least two ways to deal with a situation like this:

•    Restructure the procedure into modules (sub – procedures) so each module has its own execution plan. This introduces some complexity in maintenance but guarantees that each logical path has an efficient query plan.

•    Recalculate the plan each time you call the procedure with the current parameters. The drawback of course is you have to pay to recalculate the plan each time.

In this case the plan is not overly complex and it seemed worthwhile to recalculate the plan each run by altering the procedure to use the WITH RECOMPILE option. This brought the average run time down to 550 ms with no executions taking longer than two seconds.

NOLOCK Query Hints

SQL Server locks data during inserts and updates. These locks prevent other users from querying that same data until the transaction is complete. This can cause a query to take longer to run. Telling the query optimizer to use the NOLOCK query hint allows the database engine to read the data even if the lock would normally prevent it. One danger in using this hint is it allows a query to return potentially incorrect information. Another more subtle and serious result comes from meddling with the query optimizer. The Optimizer is a very sophisticated piece of code and is far better at making choices for optimizing data retrieval than any single DBA can ever be. Using query hints should always be a last resort and used only after detailed testing. See here for additional details.

This procedure used the NOLOCK hint 57 times even on code tables which would almost never be modified. We removed these NOLOCK hints and saw a modest performance increase of 40 ms bringing our average run time to 515 ms.

Temporary Data Structures

The next area of interest is the temporary data structures. The original developer chose to use table variables rather than temporary tables. One significant difference between these structures is that the table variable is thought to reside only in memory while the temporary table is written to tempdb, SQL Server’s internal working database. I say “thought to reside only in memory” because while that is generally true SQL Server will write table variables to tempdb if it sees fit – as it did in this case. There are cases where tempdb contention is a problem and choosing a table variable over a temporary table makes sense. This is usually the case when the amount of data you are storing is very small. Based on our experience in similar situations we decided to re-factor the procedure to use temporary tables rather than table variables. Using this approach brought our average run time down to about 90 ms.

Other Opportunities

So far the steps are taken are all very low risk and no changes to the logical structure of the query have been made. We believe additional improvements could be realized through altering the flow of the procedure to bring back less data on the intermediate database calls and to eliminate unnecessary calls based on the parameters submitted. However, for now, since performance went from 8 to 9 seconds down to a 10th of a second we decided to leave well enough alone and revisit this later if necessary.

As you can see, SQL Server tuning is as much an art as a science. It involves observation, experimentation, and testing. Sometimes there’s a single silver bullet, other times you take a more iterative approach as we saw in this example.

Run SQLCMD Scripts Using PowerShell

Many of our production DBs are restored nightly onto a development server which is a great way to test your backups and to have production data available to developers. In order to do this “Prod on Dev” though you have to recreate any DB objects you made yesterday because they got stomped in the refresh. You could make one massive script for all the objects and run it first thing every morning but if you are creating many objects maintenance can become complex. A semi-automated solution is to put each object in its own file and use SQLCMD to run each file either manually or using the Windows Task Scheduler. Better but any time you add an object you have to edit the control script. How about putting each object in a file in the same folder and using PowerShell to get the file names and run each one using SQLCMD? The PowerShell script below does just that.  Schedule it to run from Windows Task Scheduler and you are ready to refresh as needed.

################################################
# Get a list of sql files in a folder and run
# them in alphabetical order using SQLCMD.
################################################

#Setup
cls
$Folder = “C:\Temp\Source\”
$SQLInstance = “MyServer\MyDB”

#Get file names to run
$FileNames = Get-ChildItem -Path $Folder -Filter *.sql -Name | Sort-Object

#Run SQLCMD for each file
ForEach ($FileName in $FileNames)

{
 $File = $Folder + $FileName
sqlcmd -S $SQLInstance -i $File

}

As you can see, the code is quite simple but let’s take a closer look and consider some potential enhancements.

In #Setup we are clearing the screen and setting variables for future use. These values could easily be input parameters or the whole thing could be turned into a function.

In the #Get file names to run section you retrieve a list of files in the source folder and sort them alphabetically. If they need to be run in a certain order (such as tables before procedures) you can add an index to the file name (010_MyTable.sql). If the files are under source control, say TFS, you could use PowerShell to run TF.exe to get the latest version of the files. Running from the command line in PowerShell is very straightforward as you will see in the next section.

Finally #Run SQLCMD for each file is the code that loops through each file and runs it using the SQLCMD utility.  You could easily add logging and notifications if you have the need.

As you can see, you can automate tedious tasks quickly and easily using PowerShell and SQLCMD.

SQL Server & PowerShell – Finding and Changing Job Owners

We’ve been hearing for a long time that PowerShell (PS) is the wave of the future for SQL Server administration. Under the covers SSMS itself is powered by PS so there ought to be benefits but how do you get started. Everybody has seen how you can right-click a node in the SSMS Object Explorer and jump right in to PS.

But then what?

Let’s consider a small administrative task. Fred creates a bunch of jobs without setting the job owner so he owns them all. He decides to seek greener pastures and when his account is dropped the jobs quit working.

I know you all have (or should have) a script for finding and changing the owner of certain SQL Agent jobs on a SQL instance. They all use a cursor (or something like it) to query msdb.dbo.sysjobs and step through the results, executing msdb.dbo.sp_Update_Job where appropriate. This certainly works but it’s kind of a kludge. Wouldn’t it be more intuitive to say something like “Look at all the jobs, where ‘Fred’ is the owner then change the owner to ‘sa’.”?

That’s really all you do in PS and you can find examples from Mr. Google. However, PowerShell’s ability to use shortcuts and abbreviations often makes figuring out what’s really happening like making sense of alphabet soup.

Here are two snippets of code. The first lists all the jobs on an instance with their owners and the second finds all the jobs with a certain owner and changes that owner to another. Let’s unpack each and see what’s going on.

dir | FT Name, OwnerLoginName –auto

dir” is an alias for “Get-ChildItem” (Also abbreviated “GCI”). It functions like the “dir” command at the DOS prompt – it looks at the current location and returns everything “below”. The important difference is that it doesn’t return a string about the objects, it returns the objects themselves. Keep that in mind as we go on.

|” The pipe character represents the PS pipeline which hands the object(s) to the next command. In our case we are passing all the objects returned by Get-ChildItem to the next command. See this link for more information on the pipeline.

FT” is short for “Format-Table” and is used to format output. The comma separated arguments “Name” and “OwnerLoginName” tell FT which columns to return and the “-auto” automatically prettifies the output for human consumption.

Follow the above screenshots and paste this line of code into the PowerShell window and hit “Enter” to see it for yourself.

Now that we can see who owns each job lets change some owners.

dir | ?{$_.OwnerLoginName –eq “Fred”} | %{$_.OwnerLoginName = “sa”; $_.Alter(); $_.Name}

dir|” functions as above but remember the actual objects are being passed, not just a property value (name, owner, etc.). In this case we don’t need all the jobs, just the ones Fred owns.

?” is an alias for the “Where-Objects” commandlet which filters for the objects you want. The curly brackets (“{}“) contain the script block where you put the filter logic. “$_.” refers to each object coming through the pipe and “OwnerLoginName” is the attribute examined as it passes through the filter. “-eq” is the equality operator (=) while ‘“Fred”‘ is the string it is trying to match.

After this the objects that met the filter criteria are piped to the next command. Again we have an alias, “%“. This is short for “ForEach-Object” and allows you to iterate the objects and change properties or execute methods. In this case we will set (“=“) the “OwnerLoginName” value to the string “sa”. The “Alter()” is a method call that updates any changes. Finally the “Name” prints out the name of each job changed.

As you can see, PowerShell uses an intuitive and object oriented approach to working with SQL Server objects though sometimes shortcuts and aliases can make it a little hard for beginners to interpret. It’s well worth the time to take a careful look at any scripts you download so you know exactly what’s going on. PowerShell has a very nice help system that allows you to do this research. Look here if you’d like some help on Help.

Comparing Multiple Files Using PowerShell

I am moving a number of SSIS configuration files from one folder structure to another as part of a SQL Server upgrade. Both folder structures have each of the files but they point to different file drop locations, SQL Instances, etc. My task was to compare the configuration settings in each environment and map all the changes from the old environment to the new.

Rather than opening each file and visually checking for changes I went with a PowerShell approach centered on the Compare-Object cmdlet. This cmdlet compares two objects and produces a comparison report. This will work whether the two objects are folder structures, text files, or other objects.

My approach was to start with two folders containing .dtsconfig files. First compare the two sets of files and return a list of files that are in both folders. Then, file pair by file pair, open each file and compare contents. Write any differences to an output file. See the comments below for more details.

 cls
# Setup
$Path1 = “C:\Path1”
$Path2 = “C:\Path2”
$OutFile = “C:\ComparisonOutputFile.txt”

 # Delete outfile if it exists
If (Test-Path($OutFile)) {Remove-Item $OutFile}

# Write the two paths to the outfile so you know what you’re looking at
“Differences in files of the same name between the following folders:” | Out-File $OutFile -append $Path1 + ” AND ” + $Path2 | Out-File $OutFile
-append
“——————————————–“ | Out-File $OutFile
-append

 # Compare two folders and return only files that are in each
$Dir1 = Get-ChildItem -Path $Path1 -filter *.dtsconfig
$Dir2 = Get-ChildItem -Path $Path2 -filter*.dtsconfig
$FileList = Compare-Object $Dir1 $Dir2 -IncludeEqual -ExcludeDifferent

# Loop the file list and compare file contents
ForEach ($File in $FileList)
{$F1 = $Path1 + “\” + $File.InputObject
$F2 = $Path2 + “\” + $File.InputObject
$File.InputObject.name | Out-File $OutFile -append
Compare-Object $(Get-Content $F1) $(Get-Content $F2) | Format-Table
-auto | Out-File $OutFile -append
}

 As you can see, PowerShell provides significant functionality with very little effort. I was particularly impressed with the comparison report. The SideIndicator column (see below) is a simple way to communicate the differences.   This is a typical output file (the names were changed to protect the innocent).

While it isn’t a fancy GUI it is functional, visual, extensible, and built in. The extensible is important because you don’t get that from a GUI. Suppose I wanted to keep a “Gold Copy” of my config files and run a daily comparison against production to pick up changes? The code above is almost custom written for that process – all it needs is an email step. Further, you could use a similar process using SMO to generate database schema and do you own comparisons. The possibilities are truly endless.

Why I Am, And Want To Remain, A Consultant.

I have been in the IT world for over 15 years, as both a full-time employee (FTE) and as a consultant. I have worked for a major IT consulting firm, local contracting companies, and fly-by-night operators. Recently I had the opportunity to go independent so my wife and I formed an S-Corp, hired an accountant, and dove in.

As I discuss consulting opportunities with head hunters and potential clients I continue to hear the siren song of coming back into the fold and being an FTE – either for a client or as a W-2 employee of someone else’s consulting company. As you might imagine this has created a great deal of soul searching and a desire to settle the issue once and for all. I began to ponder the question “Why am I a consultant?” Answers to follow.

“Consultants make more money” – In general I believe this is true but there are two things to consider. First, consultants take a larger risk so it’s only reasonable they should reap a greater reward. Second, companies pay far more for FTEs than simply their salaries – Social Security taxes, healthcare insurance, other insurance, etc. All these expenses are borne by the consultant out of their higher hourly rate. I’m not saying consultants don’t make more, only that the difference is not as great as one might think. Further, if time is money, then the amount of time my sainted wife has spent dealing with the government’s ridiculous administrivia needs to be taken into account. I may make a little more but it comes at a cost – there are easier ways to make a living.

“Consultants are their own boss” – What exactly do you mean “boss”? Do you mean I set my own hours? No more than anyone else in the organization I am working in. If everyone else has flex-time I probably have it as well. If the rules are 8-5 with an hour for lunch, guess what my hours are? If the pressure is on for a major release the boss is every bit as likely to bite my head off as the FTE in the next cubicle – maybe more so. I do have the choice to accept a contract or not but that is no more than the right to starve. Besides, every FTE has the same freedom to say “no” – they just have to find a new gig but they can always say “no”.

“Consultants don’t have to deal with politics” – Some do, some don’t. In my own case this has usually been true but I’m sure there are plenty of horror stories on that subject.

For me, the one major reason I am and want to remain a consultant is…

“Being a consultant is fun” – The constant pressure of making and keeping myself marketable keeps me studying and exploring in my field. I know from experience that as an FTE I get fat, dumb, and lazy and I fight a tendency to coast and complain. I am not saying this is the case for all FTEs, I speak only for myself. Some FTEs are fat, dumb, lazy, and dishonest – just like some consultants. Those are character issues that are independent of whether or not you get a W-2 or a 1099. Because I am concerned with staying marketable I learn new things – which I enjoy. When I enjoy what I’m doing I look forward to the challenges and to new opportunities to learn. It’s a synergistic cycle that makes me happier at work and happier at home. I want to be the best I can be at what I do – to take care of my family and to find pleasure in my work. For me, for now, a consultant is what I am and what I want to remain.

Using A SQL Server Database Snapshot As A Development Tool

I am modifying a number of SSIS packages that extract, transform, and load data into various tables. While unit testing I need a quick way to refresh my development environment. The traditional method is to take a backup and restore but if the database is large it can take a significant amount of time – even a ten minute restore drags on forever if you have to run it a dozen times a day. Another option is to duplicate the data in separate tables then kill and fill to refresh the data. This can run quickly (depending of course on table sizes, indexes, etc.) but it can be complex to set up. My favorite approach is to create a database snapshot as my backup, run the load, evaluate the results, then restore from the snapshot – quick and easy to set up. (If you are not familiar with database snapshots here is a good overview.)

Quick

Due to their sparse file architecture DB Snapshots are created instantaneously, all that gets created is an empty file. As changes are made to the original the altered 8KB pages are written to the new file with no logging – in fact the snapshot doesn’t even have a transaction log. When you want to restore from the snapshot the copied pages are just copied back to their original location. If your changes are relatively small in terms of 8KB pages compared to the total DB you will find this lightning fast.

Easy

A DB Snapshot is as easy to create as a regular DB. The only complexity is if your DB has multiple files so must your snapshot. Never fear though, I have a code generator that will handle all the gory details, see below. Please be sure you read the comments at the top so you know how to run it. There is also code at the end of the script to restore to the original state as well as to drop the snapshot.

Caveats

Restrictions

  • Snapshots are only available in Enterprise Edition.
  • The Snapshot is Read-Only which shouldn’t be a problem when used only for restores.
  • Creating a Snapshot requires CREATE DATABASE, CREATE ANY DATABASE, or ALTER ANY DATABASE permission.

Size Issues

Normally DB Snapshots don’t take up much space when used this way but be sure to clean up after yourself! Since the snapshot keeps collecting changes if you forget to drop the Snapshot it will continue to grow to the size of the original if enough changes are made.

Exclusive Use

You are not required to have exclusive use except for the restore but if your buddies are using the original DB and you do they restore the may not be your buddies anymore. Remember, with great power comes great responsibility.

 

Code

I apologize for the code formatting issues, WordPress and I are having issues. :-( 

/****************************************************************
This script will write a script to create a snapshot of the source DB, it will
handle multiple files in the DB.  It also supports multiple snapshots of the
same source by changing @SSName.
     
      Set @SourceDBName to the name of the DB you want a snapshot of.
      Set @SSName to what you want to call your snapshot.
     
Ctrl+Shift+M will allow you to enter the DB and Snapshot names.
 
You can enable the Execute statement at the end if you want to build the
snapshot now.
********************************************************/
Declare
      @SourceDBName As sysname
      ,@SSName As sysname
      ,@SQL as Varchar(4000)
Select
      @SourceDBName = ‘<SourceDBName,sysname,Essence>’
      ,@SSName = ‘<SSName,sysname,EssenceSS>’
 
SELECT
      @SQL =
      ‘CREATE DATABASE ‘ + @SSName + ‘ ON ‘
            + LEFT(Script,LEN(Script)-1)
            + ‘AS SNAPSHOT OF ‘ + @SourceDBName
FROM
      (SELECT
            (
                  SELECT col + ‘, ‘ as [text()]
                  FROM
                        (
                              Select Top 100
                                    Case
                                          When Right(mf.physical_name,4) = ‘.ndf’
                                                Then ‘(NAME = ‘ + mf.name + ‘ ,FILENAME = ”’
                                                      + Replace(mf.physical_name,’.ndf’,’_’ + @SSName + ‘.ss’) + ”’)’
                                          Else ‘(NAME = ‘ + mf.name + ‘ ,FILENAME = ”’
                                                + Replace(mf.physical_name,’.mdf’,’_’ + @SSName + ‘.ss’) + ”’)’
                                    End               As col
                              From
                                    sys.master_files  As mf
                                    Inner Join
                                    sys.databases     As db
                                          on mf.database_id = db.database_id
                              Where
                                    db.name = @SourceDBName
                                    And
                                    mf.type_desc = ‘ROWS’
                              Order By
                                    [file_id]
                        ) As A
                  ORDER BY col ASC
                  FOR XML PATH(”)
            ) AS Script
      ) B
 
–Execute (@SQL)
Print @SQL
——————————————————————————————–
–Other useful code
      /*Restore from snapshot**********************************
            Use Master
            Go
            Alter Database <SourceDBName,sysname,Essence>
                  Set  RESTRICTED_USER With ROLLBACK IMMEDIATE
            Go
            Restore Database <SourceDBName,sysname,Essence>
            From Database_Snapshot = ‘<SSName,sysname,EssenceSS>’
            Go
      ******************************************************/
      /*Drop Snapshot*****************************************
            Drop Database <SSName,sysname,EssenceSS>
      ******************************************************/