A mock too far? What is good practice for mocking database objects with tSQLt?

by Greg M Lucas on 28 March 2012

tSQLt, the open source unit testing framework for SQL2005+ has some really great features that allow database objects to be mocked, dramatically reducing the time needed to set up reference data for each test.  But sometimes, just because you can do something doesn’t mean that you should; at least not all the time.  In this post, I would like to open a discussion to try and identify good practice with regard to when we should and should not mock database objects.

If you are familiar with mocking frameworks such as NMock or Rhino Mocks, you will understand the concept and usefulness of using “mock” objects based on interfaces.  Using a mock object allows you to work with a facsimile of the real object.  The mocking framework enables you to test the inputs and pre-define any outputs without the real object ever actually being executed.  You are probably aware that tSQLt also supports the ability to mock certain types of database objects but, this being non-object oriented database programming, there may be trade-offs.  I want to explore what that means to the way we should, or do, write database tests.

This discussion has been prompted by one specific, relatively small limitation in the tSQLt framework (as of v1.0.4462.23207), but even when that is resolved in some future release I still think that this discussion will remain relevant.

I should point out that I am in no way slating tSQLt; I think it is an excellent framework and use it regularly on a variety of projects.  However, it is open-source; Sebastian Miene (blog | twitter) and Dennis Lloyd Jr (blog | twitter) are working hard to add and improve features.  They’re also really nice, helpful guys too.  So if you do identify a bug or new requirement you can let them know here, where they’re always quick to respond.

A Quick tSQLt Tutorial

In this section we briefly cover the particular tSQLt features that are pertinent to this discussion; if you are already sufficiently familiar with these particular features you can skip these explanations.

  • FakeTable – allows a test writer to create a copy of a real table for test purposes – effectively a mock table.  The fake table has the same column structure as the real table except that they all allow NULL.   This can dramatically reduce the effort required in test setup and also helps reduce future test maintenance.  Defaults, auto-incrementing and computed columns are also supported in the current version.
  • ApplyConstraint – allows a test writer to apply a foreign key or check constraint to a fake table (but not currently primary or unique keys).  In the current version, only single-column keys are supported and there is no support for cascading deletes or updates.

The Beginnings of an Idea

It was while writing Part 10 of my series on using tSQLt that I first discussed the fact that [ApplyConstraint] currently has no support for cascading deletes or updates.  This means that if you’re writing tests against a cascading foreign key, you can’t use fake tables.  Depending on how you usually write your tests this may or may not be a problem.  However, this raised an interesting question for me about how certain types of database tests should actually be implemented.

There are a number of circumstances when you might want to fake a table as part of a unit test, including:

  • The table has many NOT NULL columns which aren’t relevant to the test in hand
  • The table contains lots of standing reference data which is not required for a simple test
  • The table has numerous foreign key references to other tables, which may themselves reference other tables.

Under all these circumstances, faking the table under test will save time and effort setting up test data and will also reduce test maintenance in future.  When working with fake tables, you only ever need to insert data into the columns pertinent to the test.  Even if new NOT NULL columns are added to the table in the future, tests using the fake table will continue to work as all columns on a table created with [FakeTable] allow NULL.

There is no question in my mind that faking tables when writing Module Tests (i.e. procedures, functions etc.) is a good thing to do.  If you are testing a module, you shouldn’t really care about any underlying tables – just that they exist in sufficient form and function for the test in hand.

For example, if you are testing a procedure that inserts data into a table with an IDENTITY column and then returns the new @@IDENTITY value as an output parameter, you just need a table with the right columns, one of which is auto-incrementing.  The (non-)existence of check and default constraints, primary and unique keys, or even whether some columns allow NULL is of no consequence to the test in hand.  Those things should all have been tested by your Schema Tests (i.e. tables, constraints and keys etc.). Remember the principle of Separation of Concerns.

However, I would like to raise the question of whether or not we should always be using fake tables for Schema Tests.  Let’s take the schema below as a not untypical testing scenario.

FK_CustomerAddress_Customer, highlighted, is a cascading delete but the other foreign keys are defined as NO ACTION on update or delete.  Employee.EmployeeId, Customer.CustomerId and Address.AddressId are all auto-incrementing columns.  With the exception of [CustomerAddress], all the tables have a unique constraint on the natural key in addition to the synthetic primary key.

 

Let us assume that we wanted to write a test to prove that the foreign key between [Customer] and [MarketingSource] is enforced.  You have two options:

1)      Write the test against mock schema objects, this would involve the following steps:

  1. Use [FakeTable] to mock [MarketingSource]
  2. Use [FakeTable] to mock [Customer]
  3. Use [ApplyConstraint] to copy [FK_Customer_MarketingSource] to the fake table
  4. Write the test

2)      Write the test against the real schema, involving the following steps:

  1. Check for a valid row in [Team], inserting a non-duplicate row if necessary.
  2. Check for a valid row in [Employee], inserting a non-duplicate row if necessary
  3. Check for a row in [CustomerType], inserting a non-duplicate row if necessary
  4. Check that there isn’t a row in [MarketingSource], deleting an otherwise unused row if necessary
  5. Ensure that the value to be added to [Customer] complies with the primary and natural keys and all foreign keys except the one to [MarketSource]
  6. Write the test

One little-documented but very cool feature is that when creating a foreign key on a fake table, tSQLt checks for the existence of a suitable unique key on the table being referenced and if not present adds a unique index.  This can be particularly helpful if the parent table has also been faked and so will not have had any primary or unique keys applied.

It is clear from the above lists, that option 2, writing the test against the real tables requires a lot more setup.  In option 1, the test against the mock schema will be faster to write and require less maintenance in the future.

One of the tenets of TDD (in any language) is that the system should be left in the exact same state after a test as it was before.  This is important to allow tests to be run in any combination or in any order; one test should never depend on the successful completion of another test.  tSQLt does an excellent job of rolling back all data changes made during each test but some things are specifically out of its control.

Option 2 also has a potential disadvantage in that you are not leaving the database in entirely the same state as before the test.  Any inserts to tables with auto-increment columns will have been rolled back – potentially leaving gaps in the IDENTITY sequence.  As Sebastian Miene says here, you cannot depend on unbroken identity sequences in a multi-user system (or when rollbacks may occur) so you should never write code that depends on unbroken sequences.  So we are breaking the rule regarding returning the system to its original state after the test but in a well-designed database, this should have no impact.

So whichever way you look at it, option 1 looks like the way to go, except for one thing.  You are not testing your code; you are testing the tSQLt framework’s interpretation of your code.  The above example was simple and tSQLt would have done a good job of copying the implementation of [FK_Customer_MarketingSource] to the fake tables.

Now imagine that we want to write a test to check that the DELETE from [Customer] cascades correctly to [CustomerAddress].  If you know in advance that tSQLt doesn’t currently support cascade actions on foreign keys then you can work around it.  But the first time I tried this I got caught out and spent a good while trying to work out why my test was failing even though my code looked perfect.  In that instance I needed to take the pain of additional setup and run my tests against the real foreign key and tables.

In the .Net and Java world, some exponents of TDD would argue that overuse of mock objects may be a code smell indicating a less that optimal design.  The danger with having such a great hammer (read [FakeTable] and [ApplyConstraint]) is that sooner or later there is a risk that everything starts to look like a nail. So are we dealing with a code smell here or do we have to modify the database unit testing paradigm because databases are different?

I have tried to list the pros and cons of both approaches but would be interested to hear about your thoughts and practices before expressing my own opinion.

So my question to you is this, when should tests be written against real tables and when is it appropriate to use fake tables and keys?

What do you normally do when writing tests?  Am I being too philosophical when I talk about the fact that I’m not actually testing my own code when using fake tables?  Or do we have to apply different rules because databases are different (to .NET, Java etc.)?

 

{ 1 comment }

Simon August 17, 2012 at 08:21

IMO when testing you should only be testing your own code, in the .NET world there’s little point in testing an auto getter/setter works – I assume Microsoft have tested that. Likewise that String.Format does what it should. Taking that to this discussion I would argue that you should accept that tSQLt fake tables are accurate representations of your own code. If there was any question over that reliability then I think there would be very deep issues and you would need to test tSQLt in all areas… clearly a pointless exercise.

I am only just starting out with tSQLt but my initial reactions are that it is appropriate to write tests against real tables when you don’t make changes to the table. Read only tests where you are working against static / lookup tables or if you have a separate version of the database that you use only for testing purposes and when you’re satisfied then you copy the procedure / table / whatever over to the development environment.

In summary I think most of the testing should be using fakes to isolate pure ideal versions of data so the logic can be exercised without being held to ransom by the actual data. But that should be backed up by testing against the actual tables where possible, probably something like 90% tests against fakes and 10% real structures (though of course that ratio may vary – this is just my feeling at the moment and it may change as I do more testing)

Comments on this entry are closed.

Previous post:

Next post: