The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

T-SQL Coding Standard

Hi guys

I started working on a standard for coding T-SQL and would appreciate if some of you guys would be kind enough to have a quick review and comment on it.

Its mostly concerned with style, but I will add a optimization chapter at a later time.

Its located at http://www.codeproject.com/useritems/T-SQL_Coding_Standard.asp

Best regards,
Casper
Casper Nielsen Send private email
Friday, September 22, 2006
 
 
Some points:

1. Spellcheck. I forget to do it too, but ya gotta do it.
2. DO NOT prefix stored proc names with sp.
3. The doc is about tql coding standards, but you discuss things like column and table names. Kinda a grey area.
4. Foreign keys: sometimes it is helpful to include the reference in the name.
5. (pg.5, #10) What do I do when a proc does more than one? (ins, upd, del). Is this assuming that multi-proc operations are done somewhere other than inside the db? The DAL? The BAL?
6. (pg.5, #15) I don't understand "new scope". You are starting and ending a new transaction. I don't think scope is correct. Let me know if I just haven't had enough coffee.
7. (pg.6, #16) see #6 above. Why are you doing a tran? Error handling?
8. (pg.6, #17) the avoid example will only work if address is the only table in the statement with a column called 'street' (i think). You are wanting them to alias to avoid the query engine from reporting ambiguous column, I would drop the 'A' alias from the column
9. (pg.6, #21) For the test, try this:

--TEST HARNESS
/*
test stuff goes here
*/
--END TEST HARNESS

Then when you need to use it, you just inline comment the block comments

--TEST HARNESS
--/*
test stuff goes here
--*/
--END TEST HARNESS

10. (pg.7,#22) I'm not particularly into having anyone but dbo owning anything. Permissions issues can be a PITA. Also, what will you do to have the other code running hit the correct table?

11. 'Database Design' & 'Coding Practices'. Maybe because I haven't seen the C# doc that you are going off of, but I didn't expect this much scope. Some of these are best practices, while others are very, very, very situational. If you are authoring a doc for the general public, it either needs to state the specific situation (Coding Standards for high transactional databases) or be extremely general. If the purpose was to better the practices where you are employed, then it can be really specific, but kinda limits it's usage to most people outside of your company.

In either case, good effort. =)
D in PHX Send private email
Friday, September 22, 2006
 
 
Generally, I agree with most of what you had with a few exceptions (so, good work!).

"the regular expression [a-zA-Z0-9]+" for naming on top of the previously mentioned camelCase - seems excessive and seems too specific to have a RE explictly listed as a rule for this unless you are going to script out everything and run an exception report (and not strictly correct - since I'm pretty sure SQL Server will not let you have a table named 2table).

You should not prefix SP names with sp_ because of the master database search involced.  Without the _, I don't believe the same issues come into play, so not a biggy.

We tend to use usp for user stored procedure so have uspObjectOperation.  We put the operation second because we'd prefer things closely associated by object, not operation.

We never use any schema but dbo.  I think we understand how the feature is meant to be used, but we like to restrict users from creating tables in our database.

I would make explicit a suggestion to avoid SELECT * in production code except where justified.  I may have missed this.  A lot of examples do this, and that's fine, but SELECT * should be restricted to places where it improves maintenance only.  It can actually cause maintenance problems due to newly emergent ambiguity in any case where joins are involved on tables of which one or more are actually evolving.  And of course, needless performance overhead.

We usually (but not always) have a high-performing GUID as our primary key, and we find this necessary.  Our GUID has special properties and maps to our OO layer.  Ints have a lot of drawbacks - and there are natural primary keys which are not int and which are preferred to an int.  Int also leads to identity columns which can be a maintenance pain if you move to multi-tenant model, merging data or replicating data.  I have no firm rules on this, however, but any time anyone says something about primary keys it needs to simply be shown to have been thoroughly examined and justified.  I think there are scenarios for a number of different classic design and usage patterns for primary keys, and a manual just of primary key design issues might be its own white paper.
Cade Roux Send private email
Friday, September 22, 2006
 
 
Why use a singular form for table names, e.g. "Address" instead of "Addresses"?

You don't mention bulk insert: might bulk insert an exception to the everything-in-a-stored-procedure rule? I'm currently experimenting with the SqlBulkCopy class.
Christopher Wells Send private email
Friday, September 22, 2006
 
 
Also, you said "Using designers to generate DDL however is allowed and encouraged".

What if T-SQL is created by a designer: what's the reason why even machine-generated T-SQL should be in a stored procedure or function, instead of being allowed in the C# layer?
Christopher Wells Send private email
Friday, September 22, 2006
 
 
Hi all

Thanks for the feedback!

Have started rewriting the standard to take into account both the errors you have identified and the suggestions.

Will take a little while to finish and will post here when done.

Best regards,
Casper Leon Nielsen
Casper Nielsen Send private email
Sunday, September 24, 2006
 
 
Taking the replies one by one, first to "D in PHX":

1. Done.
2. A lot of people have commented on this and it seems most people prefer “usp” so I changed the standard to reflect this. For a longer discussion, see: http://www.codeproject.com/useritems/T-SQL_Coding_Standard.asp in the discussion section.
3. see #11
4. Can you please elaborate on this point?
5. The standard by no means say that more complex operations should be done in other layers, quite the opposite actually: It should aim at having all set-close operations implemented in the database. I have changed this section to have a more precise definition (using Cade Roux very logical practices).
6. Well… No transactions are made here. There is a difference between “BEGIN” and “BEGIN TRAN”. This rule is identical to having “if(test) callfunction();” rewritten to “if(test){callfunction()}” in c#.
7. see #6 above.
8. Changed this section to a more precise example.
9. I cant see your partucalar implementation of a test harness adding anything valuable, it actually adds the need to comment two extra lines to use the harness.
10. a) Have the group user assigned to the same permissions as dbo to avoid permission issues. b).  to access UDO’s that are owned by different schema than dbo, simply call it using that schema name, i.e: “SELECT * FROM schema.tablename”.
11. I will rewrite the document to have subsections using different scenarios and best practices for them – good point!

Will upload the altered document when Im done going through all the suggestions from this forum (probably tomorrow).
Casper Nielsen Send private email
Sunday, September 24, 2006
 
 
Cade Roux


Thanks to you  for the suggestions!

”regular expression”
I used a regular expression not as much to indicate a rule checking mechanism, but more to have a precise definition of allowed names. Especially I wanted to prohibit the use of underscores and language dependent characters. Ive rewritten this section with an updated regular expression and a short text describing it.

“SP naming”
Changed the naming of procedures to “usp” as this seems to be the more generally accepted form.
Liked the “uspObjectOperation” form and included this in the standard.

“schemas used for grouping”
Using schemas to group objects can have many advantages over simply naming the groups, but having it as a style rule is wrong I agree. I will move it to “Coding practices” instead. It now have this wording:” Try avoiding to prefix any UDO beyond what is described here. If you need to further group objects, consider using a schema different from dbo to create the objects.”

“I think we understand how the feature is meant to be used, but we like to restrict users from creating tables in our database.” – remember to differantiate between external and internal users – internal Sql Server users serves two purposes: The map external users, but can also exist “by themselves” without any user mapping directly t0 them – in effect being schemas. (or combinations hereof).

“SELECT *”
I have to consider this more carefully tbh – personally I use tons of these and think they are very powerful indeed. I’m aware of their failings however, but think to band them as a rule is wrong – think I will need to compile some examples where they are good and where they are not. I did put a rule that they are not to be used in insert/update statements where I see the problems most clearly.

“Primary keys”
Again you are correct… Of course there should be an option to use guids as primary keys. Both ints and guids have there drawbacks and mature developers should be able to chose between the two. Personally I don’t like using guids as they require intensive cut n’ paste to write even the simplest test-queries, but they have immense value for replicated scenarios. Again this rule belongs in a best practises section and will be moved.
Regarding the issue on other primary key scenarios im simply not informed enough on the subject to say whether or not this is the case – personally I have never seen a design that I found to be good that didn use either guids or non-informational ints as primary keys. But would be glad to see an example…
Casper Nielsen Send private email
Sunday, September 24, 2006
 
 
> Why use a singular form for table names, e.g. "Address" instead of "Addresses"?

For example when I get a schema using the GetSchema method, the schema contains/returns collections which are named e.g. 'Tables', 'Columns', 'Views', etc.; using a plural noun for table names seems better to me.
Christopher Wells Send private email
Sunday, September 24, 2006
 
 
Some SQL Server 2000 error-handling tips that you might want to add to your document:

1) Use sysmessage, sp_addmessage, and placeholders in preference to hard-coded error messages.
2) When using sp_addmessage and sysmessages, always use error message number of 50001 or greater.
3) With RAISERROR, always supply a severity level <= 10 for warning messages.
4) With RAISERROR, always supply a severity level between 11 and 16 for error messages.
5) Remember that using RAISERROR doesn't always abort the current batch, even in trigger context.
6) Always save @@error to a local variable before using it or interrogating it.
7) Always save @@rowcount to a local variable before using it or interrogating it.
8) For a stored procedure, use the return value to indicate success/failure only, not any other/extra information.
9) Return value for a stored procedure should be set to 0 to indicate success, non-zero to indicate failure.
10) Set ANSI_WARNINGS ON - this detects null values in any aggregate assignment, and any assignment that exceeds the
maximum length of a character or binary column.
11) Set NOCOUNT ON, for many reasons.
12) Think carefully about whether you want XACT_ABORT ON or OFF. Whichever way you go, be consistent.
13) Exit on the first error - this implements the KISS model.
14) When executing a stored procedure, always check both @@error and the return value. For example:

EXEC @err = some_stored_proc @value
SET  @save_error = @@error
-- NULLIF says that if @err is 0, this is same as null
-- COALESCE returns first non-null value in its arguments    
SELECT @err = COALESCE( NULLIF(@err, 0), @save_error )
IF @err <> 0 BEGIN
-- Because sp may have started a tran it didn't commit
    ROLLBACK TRANSACTION
    RETURN @err
END

15) When executing a local stored procedure that results in an error, do a rollback because it's possible for the
procedure to have started a transaction that it didn't commit or rollback.
16) Don't assume that just because you haven't started a transaction, there isn't any active transaction - the caller may have started one.
17) Ideally, avoid doing rollback on a transaction that was started by your caller - check @@trancount.
18) But in a trigger, always do rollback, as you don't know whether the caller initiated an active transaction (because @@trancount is always >= 1).
19) Always store and check @@error after the following statements:
    INSERT, DELETE, UPDATE
    SELECT INTO
    Invocation of stored procedures
    invocation of dynamic SQL
    COMMIT TRANSACTION
    DECLARE and OPEN CURSOR
    FETCH from cursor
    WRITETEXT and UPDATETEXT
20) If DECLARE CURSOR fails on a process-global cursor (the default), issue a statement to deallocate the cursor.
21) Be careful with an error in a UDF. When an error occurs in a UDF, execution of the function is aborted immediately and so is the query that invoked the UDF - but @@error is 0! You may want to run with SET XACT_ABORT ON in these circumstances.
22) If you want to use dynamic SQL, try to have only a single SELECT in each batch because @@error only holds the
status of the last command executed. The most likely errors from a batch of dynamic SQL are syntax errors, and these aren't handled by SET XACT_ABORT ON.
Mark Pearce Send private email
Monday, September 25, 2006
 
 
"Christopher Wells"

And thanks to also you for all the input!

”Table names”
Well I kinda expected this to create some discussions. The reasoning for naming tables in the singular form goes like this
1. Tables are the most central object type in the database – and the most referred - this means we want the naming convention to be as uniform and simple as possible.
2. All tables without exception stores many entities.

Basically we have two choices for naming the tables:
Singular and plural.

Examining plural we see that it conflicts with 1 in that is not the simplest naming form.
It fits very well with 2 if we see a table as a grouping of entities, but not if we perceive it as an object – if there was a case where a table where describing only 1 single entity this would not be the case, but I have yet to see a table designed for containing only 1 entry.

After considering this I decided to name tables in the singular form. (For non-English persons like myself it gives the added value of not having to remember all the funny ‘s’ ‘es’ ‘ses’ ending on the nouns.


“Bulk insert”
Well no I don’t see it as an exception at all. First of the operation should only be performed in the DAL.  Second there is no rule saying “put everything in a stored procedure” – it says “put all t-sql in a stored procedure”. So use stored procedures to ready the tables for bulk copy and use the SqlBulkCopy class to define the column mappings. (Personally I would consider using “BULK INSERT” in a stored procedure or a SSIS script).

“What’s the reason why even machine-generated T-SQL should be in a stored procedure or function, instead of being allowed in the C# layer?”
The reasoning for not using designers to generate T-sql is:
The output of Microsoft’s designers is very poorly formatted and is not intended for human eyes – this breaks the style guidelines. Furthermore opening a nicely formatted piece of code in a designer ruins its formatting.

Having t-sql exist in the c# layers is in my opinion simply bad practise:
* Having it in strings creates the need to concatenate strings for close-to-acceptable reading and invites a host of situations with incomprehensible errors pertaining to this.
* Using the designers, sql can be stored in table adapters, but this constraints alteration to go through these poor designers. Furthermore it invites the programmer to use these in the presentation layer which is a totally non-acceptable architectural decision (unless of course you are a Microsoft Visual Studio sales demonstrator, in which case its nifty to be able to create a full application in 5 minutes – for everyone else its simply a very-bad-idea).

And more importantly: Having a uniform programming interface (the database itself) gives a lot of advantages including the major one that Programmers only ever have to look one place for t-sql code and all code can be tested and used via sql-scripts

Best regards,
Casper Leon Nielsen


(PS: I just uploaded the version 0.3 document with all the corrections stated in this forum so far)
Casper Nielsen Send private email
Monday, September 25, 2006
 
 
Concerning the suggestions of Mark Pearce:

Thank you for all the suggestions!

Most of your suggestions pertains to using the Sql Server 2000 - this document is made for Sql Server 2005. This is very clear if one opens the document, but unfortunately I have not made it as clear when looking at the document web-page.

Some of the suggestions are very good also for 2005 and I will include some in a special Transaction Best Practice section at my earliest convenience.

Best Regards,
Casper Leon Nielsen
Casper Nielsen Send private email
Monday, September 25, 2006
 
 
"personally I have never seen a design that I found to be good that didn use either guids or non-informational ints as primary keys. But would be glad to see an example… "

There are natural primary keys which are not ints and not Guids.  In these cases there is no reason for an arbitrary internal primary key unless there is an (possibly misguided) attempt to save storage space.

In particular, it is often the case when a design involves an external data source or system.

For instance, in interfacing to a standard code system - like medical procedure codes.  The primary key in such a table would be the alphanumeric procedure code.  It is possible that the third-party table would have that as the primary key, but then the main working tables in your app would use an internal id.

When receiving data from third party vendors, I tend to have a composite primary key SupplierID, ProductID, where the ProductID is the SKU from the vendor.  There may be an internal key for my system but the table containing the master pricing from the vendors has the primary key as SupplierID (my internal ID)/ProductID (their natural key).

Another case is when dealing with HR systems where there is already an employee file number exposed.

In all these cases, it is very possible there there will be a table without any artificial primary key, and where one will not be assigned - especially if the data is replaced on a regular basis.

There are also cases with some lookup tables, especially with abbreviations.  Often a single character abbreviation is the key and the abbreviation and description are the only columns in a lookup table.
Cade Roux Send private email
Tuesday, September 26, 2006
 
 
Guess I asked for it… ; )

Must admit I started this project feeling quite confident on my practices, but I can sense that my experience level is inadequate to write a standard that will be useful for all.
Saying that I hope you will bear with me in the following.

The reasons I initially introduced the concept of using internal id’s over external id’s in the database is to decomplicate the programming model.
Having only ints and guids used as primary keys, introduces the following bonuses:

1) Simplified and uniform query model:
Most queries become more simple to write - as joins will only have to be performed on a single key.
Basing the query on an external id however, will in specific situations still create the need to join on the composite key.

2) Simplified identification of entries in all layers.
Having composite keys may on the surface seem a natural choice when looking at the specific data layout the database is supposed to be modelling – however it complicates the identification of rows in other programming layers.
In the business layer we also have to identify the products by these composite keys, creating the need to encapsulate say an int and a string in a structure denominating the product – lets call it a “Key”. Each and every time we have to handle a row or a group of rows in the business-layer we have to make the conversion from a row to this Key structure.
With the simplified model I propose, using only int or guids as complete identificaters for products, we eliminate this need to encapsulate identifiers in Key implementations.
Making say a web-application with two pages: a listview on a table and a detailsview pertaining to a particular row - is therefore more easily abstracted when using only one type of key: Pressing a row we read the primary key directly and sends this to the detailsview page – this will work for all tables. Using a composite key we have to make a particular implementation to handle the specific table.

3) Speed of execution:
Well it’s a very involved discussion, but lets just say my assumption is that it is possible to design the database in such a way that for queries mapping to and from the composite keys it will perform the same as having composite primary keys. For internal queries it will outperform composite primary keys.

This pattern however is only valid some scenarios: If the data is highly external then it becomes increasingly cumbersome to handle the mapping in and out of internal id’s, especially for insert/update operations.
I think we can say that this pattern should not be used when the datamodel is highly changing and external. For isolated databases however I still feel it’s generally the best approach.

“There are also cases with some lookup tables, especially with abbreviations.  Often a single character abbreviation is the key and the abbreviation and description are the only columns in a lookup table.”
With abbreviations do you mean situations like having a field denominating the state of the row like: “Updated”, “In stock”, “PENDING” and so forth?

If so, this is a question of normalisation – to use an abbreviation as foreign key is a slight denormalisation from having the foreign key being an int: we use the key itself to carry some information.
I am aware however that using an abbreviation as foreign key will help the developer visually identify the state of the row more easily than remembering the numeric value. However this approach have some pitfalls: How complicated will we accept these abbreviations to become? Is single abbreviations the limit? If so what happens when we have to abbreviate both “Accepted” and “Absolved” – which one should be called “a”?
The bonuses stated above will also disappear.
This said however it is far more acceptable to use abbreviations as keys, rather than the shocking abuse where the abbreviation is not even a key, but simply a value.

As you describe there will also be other situations, staging tables etc, where artifical keys should not be used.

So:
I will rewrite the standard to be more precise on when to use the int/guids primary key pattern and when to use the external id pattern. I will also reformulate the rule from “Avoid” to something along the lines of “Consider”.

Again thank for all your valuable inputs!

Best regards,
Casper Leon Nielsen
Casper Nielsen Send private email
Wednesday, September 27, 2006
 
 
> In the business layer we also have to identify the products by these composite keys

FWIW, I don't know about you but I'm likely to represent row data using something like a Dictionary class, for example:

  using System.Collections.Generic;
  Dictionary<int,CustomerData> customers; //int is customer id key

Given that I'm using a generic template class, it hardly matters what the type of the key is (for example int, string, Guid, or something composite)
Christopher Wells Send private email
Wednesday, September 27, 2006
 
 
"I'm likely to represent row data using something like a Dictionary class"

Well this is acceptable if you for all datasets are making custom BusinessObjects to wrap these.
This is a very cumbersome process however.

I use another model where I wrap Tables and simple DataSets into Typed DataSets and use these as business objects directly, using them as datasources in the presentation layer - a model I imagine is widely used.
For this model to be acceptably layered I off course never put TableAdapters into the Typed DataSets, but populate them in a DataLayer.

"Given that I'm using a generic template class, it hardly matters what the type of the key is (...) "
On the contrary: For the scenario I describe in my last post the type matters no matter what generic containers you use. In the scenario above where you need to call a detailsview page from a list page with a id as argument, you have to be able to serialize that id to a string to be able to append it in f.ex a url, hence the need for either a wrapper class on the full identifier (and a string serialization implementation on that wrapper), or a concrete implementation of the detailspage having specific typed arguments for all the components of the key.

Best Regards,
Casper
Casper Nielsen Send private email
Wednesday, September 27, 2006
 
 
> We usually ... have a high-performing GUID as our primary key

I might have worked out what you mean by a "high-performing" GUID: see http://sqljunkies.com/Article/4067A1B1-C31C-4EAF-86C3-80513451FC03.scuk and http://www.informit.com/discussion/index.asp?postid=a8275a70-0698-46f0-8c8f-bf687464628c&rl=1
Christopher Wells Send private email
Saturday, September 30, 2006
 
 
On schema and users:

In 2005, a user does not own objects, a user have the right to use a specific schema. So, in 2000, dbo was considered a database user. In 2005, dbo is a schema with the sa as default owner of that schema but where others can have the right to use it. In 2005, you will not get the known problem in 2000 that a user cannot be dropped because he owns something
Casper Nielsen Send private email
Monday, October 02, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz