Adobe Solution Partner

March 1, 2007

A Database Independent “Next ID”

Filed under: ColdFusion, Databases — Tags: , , — Daryl Banttari @ 10:09 am

A blog entry at:

http://www.coldfusionmuse.com/index.cfm/2005/6/10/sqlInterbase

reminded me of how people don’t fully understand the multithreaded nature of ColdFusion with respect to databases when looking for the next ID value for a table.

People quite often do something like this in a page:

<cfquery name="qNextID" datasource="#request.dsn#">
select max(id) as lastid from sometable
</cfquery>

<cfquery datasource="#request.dsn#">
insert into sometable(id, data)
values (#lastid+1#, '#somedata#')
</cfquery>

However, this will give you duplicate IDs when two pages run concurrently. For example, request A and request B run at the same time:

Request A selects max(id) AND gets 24999
Request B selects max(id) AND gets 24999
Request A inserts INTO sometable WITH id 25000
Request B inserts INTO sometable WITH id 25000 AND FAILS

Or worse, if there’s no unique (eg, primary key) index on sometable.ID, you wind up with two identical ID values in the table.

Wrapping them in a transaction doesn’t help. The default isolation level doesn’t hold locks for reads, so the result is the same. Using a higher isolation level simply changes the failure mode to a deadlock. Better than a duplicate ID, but still a failure. An additional drawback of this method is that the database has to seek the index on ID every time, which for very large tables is expensive, relative to the following method.

The best cross-database method of implementing a unique ID is to create a table to store them. I usually name the table NextIDs(TableName varchar(50), NextID int)

Then, you start the transaction and do the update statement FIRST:

<cftransaction>

<cfquery datasource="#request.dsn#">
update NextIDs
set nextID = nextID + 1
where tablename = '#tablename#'
</cfquery>

<cfquery name="qNextID" datasource="#request.dsn#">
select NextID
from NextIDs
where tablename = '#tablename#'
</cfquery>

</cftransaction>

<cfquery datasource="#request.dsn#">
insert into tablename (id, data)
values (qNextID.NextID, '#data#')
</cfquery>

You /must/ do the update first, or you’ll still have the race condition under the default transaction isolation level; if you set isolation to REPEATABLE_READ or SERIALIZABLE you’ll get deadlocks when the race condition occurs.

To summarize:

Select max(id) is never a thread-safe way of implementing a unique identifier. If you want to implement a cross-database method of creating a unique, incrementing identifier, you should create a table for that purpose, and UPDATE before SELECTing from that table (in a transaction) to eliminate race conditions and deadlocks.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Technorati
  • TwitThis

12 Comments »

  1. Hi Daryl, I know this is a cross-database post, but is the following okay for SQL Server only?

    set nocount on
    insert into …
    select @@identity as newid
    set nocount off

    I’ve been using that for years on SQL Server…

    Comment by Nat Papovich — March 1, 2007 @ 12:00 am

  2. Do note that this approach will single thread your entire server architecture for insert operations due to lock contention on the row/page of the single row key table. Anytime you issue the update, then select to get the next value, the RDBMS will lock the single row/page of the key table until the end of the transaction. The UPDATE lock is an exclusive lock which will prevent the next key generation call from being able to do its own UPDATE operation. This means you can only generate a single key within a single logical unit of work (i.e. critical section). Although this is indeed thread-safe it is also completely serialized. In other words, if you had an order entry system where 1,000 orders all occurred at the same time, they would all require generation of a primary key value and hence would not occur in parallel but in a purely serial FIFO fashion. There are several table based solutions folks have used including pre-allocating blocks keys into a keystaging table. With row-level locking by having a block of keys in the staging table and not bump into contention issues on the read/delete of the next available key.

    Celko also has several great solutions for this issue.

    Comment by Dave Wolf — March 1, 2007 @ 12:00 am

  3. The @@identity trick in SQLServer is also very dubious and can return very erroneous results. Basically @@identity is a global variable, and returns back the last identity value generated. So take the case where you insert a row, and a trigger fires to populate a child table, or other metadata and that table itself has an identity value. @@identity returns the second identity value, the one caused by the insert from the trigger and not from the parent table. Since the child table could have less rows, it could very easily return a value for an already used key. @@identity is very much not thread safe.

    Comment by Dave Wolf — March 1, 2007 @ 12:00 am

  4. "Wrapping them in a transaction doesn’t help. "

    Would you mind going into more depth on that? I always thought when the docs said cftransaction "groups database queries into a unit," it pretty much meant that would be the case.

    Comment by Sam — March 1, 2007 @ 12:00 am

  5. Nat,

    Using @@Identity is fine for SQL Server, so long as you don’t have triggers that do inserts into other tables that have Identity columns.

    Dave,

    Any time you have any variable that you want to update consistently across threads, you MUST single-thread changes to the variable. To do so, you MUST have some sort of semaphore to control access across threads; in this case, it’s the update lock. The transaction block where you’re single threading is blindingly short, and should be complete in 4ms or less, so concurrency really only becomes a problem if you’re doing 250+ inserts a second, in which case using something DBMS-specific is probably a good idea.

    In SQL Server, @@Identity is not a "trick", it’s a well defined feature, and will only give "dubious" or "very erroneous" results if you either (a) have triggers, or (b) don’t do the SELECT @@IDENTITY operation before the next identity insert. One popular mistake is to do "SELECT @@IDENTITY FROM SomeTable", which simply does a Cartesian join from the constant returned by @@IDENTITY, and the rows in SomeTable. So, if SomeTable had 100,000 rows in this case, you’d get 100,000 copies of the @@IDENTITY value. And if your last identity insert was actually for SomeOtherTable, you’d get a completely unexpected value.

    Also, @@IDENTITY is not a global variable, it’s a connection variable. So it doesn’t return the last identity value inserted, it returns the last identity value inserted /on the current connection/. Since ColdFusion pools connections on a per-page basis even if you have connection pooling turned off, it’s completely safe from that perspective.

    If you have triggers, the scope_identity() function is designed to return the proper identity value, like this:

    <code>
    <cfquery name="q" datasource="Northwind">
    INSERT INTO SomeTable
    VALUES (‘foo’)
    </cfquery>
    <cfquery name="q" datasource="Northwind">
    SELECT SCOPE_IDENTITY() AS ID
    </cfquery>
    </code>

    Sam,

    A transaction groups queries into a logical group of work, and ensures that all of the *changes* in the transaction either commit, or roll back. However, a transaction doesn’t change the isolation level for reads, which by default is "READ COMMITTED". READ COMMITTED will acquire and hold read locks only long enough to read the row(s), and release the lock once the row is read. The next level, REPEATABLE READ will hold locks on any rows that were actually read until the end of the transaction, and SERIALIZABLE will go so far as to prevent inserts between ranges you selected from. But none of these prevent you from reading a value that another thread/connection has read (since read locks can be made concurrently), so the sort of read consistency you’re expecting is never enforced. This is why I do the update first, as that will actually block other threads/connections from reading the value I updated until the end of the transaction. (The isolation level READ COMMITTED won’t read the row until the transaction is committed.)

    Hope that helps… :-)

    –Daryl

    Comment by Daryl Banttari — March 1, 2007 @ 12:00 am

  6. Thanks, that did explain it quite well.

    Comment by Sam — March 1, 2007 @ 12:00 am

  7. Interesting timing on this post, Daryl. I discovered something yesterday that seems to have changed in CF although I’m not sure when. I’m developing an app on 7.0.2 w/ MSSQL and discovered that I could do the insert and select scope_identity() as newID within the same cfquery tag set and it works just fine. I remember having to use nocount on/off in the past (as Nat indicated) to pull this off. I’m not sure when this changed, just that it works now without it and in the same cfquery tag.

    So:
    [cfquery name="insQuery"]
    INSERT INTO SomeTable
    (stuff, things)
    VALUES
    ("stuff","things")
    SELECT SCOPE_IDENTITY() AS newID
    [/cfquery]
    will leave you with #insQuery.newID# as your new id.

    While it seems to work just fine, I’m also not sure if there’s a good reason to *not* do it that way so if anyone knows that to be a bad idea for some reason please enlighten. It’s all done within the context of a transaction with isolation="SERIALIZABLE" (as the queryname.newID is used to populate child tables and such and performance/scalability isn’t an issue in this case) so it all seems pretty safe.

    Comment by Scott Krebs — March 1, 2007 @ 12:00 am

  8. I wish more databases used sequences like oracle and postgres

    http://www.psoug.org/reference/sequences.html
    http://www.postgresql.org/docs/8.1/static/sql-createsequence.html

    really nice and simple to use

    Comment by zac spitzer — March 1, 2007 @ 12:00 am

  9. Scott,

    SQL Server always seemed to support multiple statements in the same query, but the DataDirect drivers used by CF stopped supporting that a few releases ago. (Any query that began with the word "INSERT" was no longer allowed to return a resultset.) I didn’t know that they fixed that– good to hear.

    In any case, for future compatibility reasons, I’d recommend using two CFQUERYs for that block of code. Also, SCOPE_IDENTITY() is guaranteed to be correct, so long as it’s called before the next identity insert on the same connection. So, you don’t need to bother with a cftransaction when using that; the transaction, in your case, only adds extra db call and locking overhead, and may contribute to deadlocking problems in the future.

    Zac,

    I really wish that SQL had defined a standard for getting generated sequence numbers. But then, they didn’t even define how to format dates and times; that’s implementation-dependent. (JDBC and ODBC define a standard for formatting dates and times; it’s a responsibility of the JDBC or ODBC driver to convert dates and times in queries to a format palatable to the individual DBMS. That’s why the helper function in CF is called "CreateODBCDateTime", as opposed to "CreateSQLDateTime".)

    –Daryl

    Comment by Daryl Banttari — March 1, 2007 @ 12:00 am

  10. Welcome to our website for you World of Warcraft Gold,Wow Gold,Cheap World of Warcraft Gold,cheap wow gold,buy cheap wow gold,real wow gold,sell wow gold, …
    Here wow gold of 1000 gold at $68.99-$80.99 ,World Of Warcraft Gold,buy wow gold,sell world of warcraft gold(wow gold),buy euro gold wow Cheap wow gold,cheapest wow gold store … <a href="http://www.33game.cn"><strong>ffxi gil</strong></a> buy euro gold wow wow gold–buy cheap wow gold,sell wow gold.welcome to buy cheap wow gold–cheap, easy, wow gold purchasing.World of Warcraft,wow gold Super …
    We can have your wow gold,buy wow gold,wow gold game,world of warcraft gold, wow Gold Cheap wow, Cheap wow gold,world of warcraft gold deal,Cheap WOW Gold …

    Welcome to our website for you World of Warcraft Gold,Wow Gold,Cheap World of Warcraft Gold,wow gold,buy cheap wow gold,real wow gold,sell wow gold, …
    Here wow gold of 1000 gold at $68.99-$80.99,World Of Warcraft Gold,buy wow gold,sell world of warcraft gold(wow gold),buy gold wow lightninghoof instock Cheap wow gold,cheapest wow gold store …
    <strong>ffxi gil</strong> wow gold–buy cheap wow gold,sell wow gold.welcome to buy cheap wow gold–cheap, easy, wow gold purchasing.World of Warcraft,wow gold Super …
    Wow gold- Gold for buy gold wow lightninghoof instock EU-Server: …wow Gold EU: starting from 84,99?; 3000 WoW Gold EU: starting from 119,99?. wow Gold- Leveling Services: …
    We can have your wow Gold,buy wow Gold,wow Gold game,wow gold, Cheap wow Gold, Cheap World of Warcraft Gold,world of warcraft gold deal,buy cheap wow gold,Cheap WOW Gold …

    Here wow Gold of 1000 gold at $68.99-$80.99,World Of Warcraft Gold,buy wow Gold,sell world of warcraft gold(wow gold),Cheap wow gold,cheapest World of Warcraft Gold store …

    Comment by xiaoxinwow — April 6, 2007 @ 12:00 am

  11. hi daryl

    thanks for posting this. i wasn’t aware of the
    limitations that the transactional approach
    to the issue has.

    as a matter of fact, your explanation as to what
    exactly happens in terms of locks vis-a-vis isolation levels made me realize that i’d really love to get my hands on a succinct summary of the technical basis behind database transactions.

    specifically, i’m looking for a text that explains
    isolation levels, locking mechanisms, performance
    considerations and all that… basically all the
    stuff you seem to know so well.

    can you recommend a book, or a website that would help?

    thanks,
    franco

    Comment by franco — October 13, 2007 @ 12:00 am

  12. The major difference between @@identity and Scope_identity() is this one:
    SCOPE_IDENTITY() will return the Identity value generated in a table that is currently in scope.
    @@IDENTITY will return the Identity value generated in a table irrespective of the scope.
    In other words, if you insert a row in 1 table and there is one trigger that inserts a row in another table, the @@identity will give the identity on the second table (that is the last identity) while scope_identity will stay within the scope of the first table.
    It is a better practive to use Scope_identity when inserting, updating one row.
    If you inserts/updates multiple rows, use the output clause and the Inserted, Deleted “temporary” table within the insert, update operations.

    Concerning the multi-threading of Sql server: no matter what will happen within your coldfusion code side, you can only insert, one batch at a time, no matter how many rows you insert.
    There is no reason to get the last id you inserted unless you might want to update this very same row.
    Inserting new row should be done “blind”.
    There is only a few cases I can think of that would require you to know the last ids to insert afterwards. One of them if that this very same id carry some functional aspect like a combo (Date + _ + SomeIncrementalId).
    another reason would be to have several tables that would carry ids to set up some sort of horizontal partitioning on standard version but even so, you can have a job that would check the identity once every day and reseed your tables to make sure we will not get into eachother.

    Comment by Clement Huge — June 28, 2010 @ 2:04 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

 

Server Down?

Maximize Web application uptime by drawing upon Webapper's years of experience tuning and stabilizing many of the world's largest ColdFusion Web applications. Contact us today!