Improve ColdFusion Performance by 1000x? Believe Me, It’s Possible.
Not only is it possible, but I spend most of my time every week doing this for our Webapper customers.
As we all know, businesses and their supporting web applications are constantly changing and adapting. With any luck, change is in the direction of growth, and adaptation simply becomes a component necessary to keep up with that change. As businesses and web applications grow over time and exposure, so does the data. A tool or process in a web application that has been humming along and performing well over the years is subject to turning on a dime and making your life miserable as additional data and concurrent usage flood your application.
I recently fixed a classic example of this for a customer of ours. This particular customer runs (among other things) a pretty well utilized job posting and searching application which sits atop ColdFusion 8 and a SQL Server 2005 database. The site is averaging about 6,000,000 page views a month, so performance gains can be expected to have a broad impact when you factor them by that level of utilization. The customer was reporting all sorts of performance problems and slow running pages within their application but was at a bit of a loss in terms of specifically identifying the problem areas and addressing them. That’s when they called Webapper.
I never begin a performance analysis engagement like this for a customer without first installing SeeFusion. It’s quite simply one of the best tools I’ve ever worked with when it comes to quickly identifying the problem areas and usage trends of a ColdFusion application.
After a day of monitoring the application performance with SeeFusion, it became quickly evident that a very critical page in their application needed some tuning – their “Browse Jobs” page. I could see via the SeeFusion logging repository that this page was continually running long (anywhere from 20-45 seconds), and was accessed very frequently in comparison to other pages on the site. So, I started looking into why this page was causing such headaches for our customer.
The page delivers some pretty simple functionality. It first shows a total number of active jobs available in the application. Then, just beneath that, it generates a list of all 50 states, a count of the jobs listings available in each state, and a parameter-driven link to a second-level page that then displays the jobs for that state. Pretty simple concept. When I accessed the page, the average execution time I was getting was around 24 seconds (24,000 ms) per request. This clearly needed to change.
The first place I typically start when evaluating a page or process like this is with the SQL operations. No matter how clean and optimized your CFML code may be, if your SQL calls are lackluster in the performance department, then it bleeds into everything else. You can’t build a great house on a tattered foundation, right? So I always begin with tidying up the SQL performance before moving on to any potential CFML optimizations.
I could see this page was running 3 varieties of SQL operations. First, a simple parameterless query to the tblJobPostings database table in order to determine the overall number of active job postings available to the entire site. Second, another parameterless query to the tblStates table in order to get a listing of all states. And third, a loop through the query of all 50 states in order to run a by-state query returning the count of jobs in each particular state. Put them all together and you have a total of 53 queries. Now, I immediately jumped out of my seat at the idea of running the by-state job counts from within a loop. But, I stayed focused on first task of SQL optimization before getting sidetracked – I’d get to that part later.
The query running within the loop of all states to get each by-state job count looked something like this.
SELECT
count(jobPostingID) countJobs
FROM
tblJobPostings
WHERE
state = '#qStates.state[qStates.currentRow]#'
AND jobCode = 'ACT'
AND (
expireDate IS NULL
OR expireDate >= GETDATE()
)
</cfquery>
Seems like a pretty simple query, but looks can be so deceiving when it comes to SQL performance if you’re not looking under the hood. I could already see in the ColdFusion debugging output that the 50 instances of this seemingly simple query were taking about 400-500ms each to execute. Multiply that average execution time by 50 states, and you can quickly see why the page was taking an average of 24,000 ms to render. So, I cut-n-paste an example of this query from the ColdFusion debugging output and brought it over into Query Analyzer in SQL Server Management Studio. I paste the query into Query Analyzer, enabled the “Include Actual Execution Plan” option, and executed the query.
I could immediately see some problems in the execution plan. SQL Server was performing a pretty costly index scan operation and then combining those results with a key lookup operation. This meant that SQL Server was utilizing an available index to match part of the selection criteria presented in the query, but was then having to perform additional work and filtering in order to evaluate other columns that weren’t a part of the index being used. Immediately, I took a look at the indexes on the tblJobPostings table. There was a clustered primary key index and a few single-column indexes, but not one multi-column index. This immediately explained the execution plan I was staring at. So, I created a new index on the table that included state, jobCode, expireDate, and jobPostingID in that order, essentially to match the conditions being applied in the WHERE clause and then including the columns referenced in the SELECT clause. With this approach to indexing, SQL Server should now be able to get all the information it needs for this query from the new index without even directly accessing the table itself.
With this new table index in place, I returned to the web browser and reloaded the page.
Execution time: 12,400 ms
I had already effectively doubled the performance throughput of the page simply by creating that index. But, 12 seconds was still far from being an acceptable load time. So I moved on to step 2 – reorganizing the CFML code.
Experience has taught me that running <cfquery>s from within a <cfloop> is almost never a good idea. Not only can it be a system memory hog, but it’s often a shortcut for not understanding how to get that same data in a single database call. When I had first glanced at this code, I had already made a mental note that I would need to re-write this into a single database call, grouping the job counts by state, and organizing it in a way that I could later reference individual job counts by simply indicating a state.
Re-writing the query was the easy part.
SELECT
count(jobPostingID) countJobs
, state
FROM
tblJobPostings
WHERE
jobCode = 'ACT'
AND (
expireDate IS NULL
OR expireDate >= GETDATE()
)
GROUP BY
state
</cfquery>
This addressed the issue of getting all the counts in a single database call, but didn’t directly address the problem of how I would go about referencing the job counts from within the <cfloop> through all states. But this is a technique I use all the time, so I quickly moved onward. I had decided that converting this query into a structure would be my best option. I would create the structure with keys for each state abbreviation and a value of the respective job count for that state. The code looks something like this.
<cfloop query="qJobCountsByState">
<cfset stcJobCountsByState[qJobCountsByState.state[qJobCountsByState.currentRow]] = qJobCountsByState.countJobs[qJobCountsByState.currentRow]>
</cfloop>
The result of this code is a structure that visually might look something like this:
-------------------------------------
AL 12756
AK 9756
AR 11220
With the data organized this way, I can use the following bit of code to get the specific job count of a given state.
<cfset thisStateJobTotal = stcJobCountsByState[thisState]>
At this point, I was ready to move ahead with adapting the code within the <cfloop> to utilize this new data structure. Here’s a summary of original code.
<cfquery name="qStateJobCount" datasource="#request.myDSN#">
SELECT
count(jobPostingID) countJobs
FROM
tblJobPostings
WHERE
state = '#qStates.state[qStates.currentRow]#'
AND jobCode = 'ACT'
AND (
expireDate IS NULL
OR expireDate >= GETDATE()
)
</cfquery>
<cfoutput>
#qStates.state[qStates.currentRow]# ( #qStateJobCount.countJobs# )
</cfoutput>
</cfloop>
But with the new single grouped query to get job totals, and the conversion of that query to a structure, I could re-write the code to look something like this.
<cfset thisJobCount = 0>
<cfif structKeyExists(stcJobCountsByState,qStates.state[qStates.currentRow])>
<cfset thisJobCount = stcJobCountsByState[qStates.state[qStates.currentRow]]>
</cfif>
<cfoutput>
#qStates.state[qStates.currentRow]# ( #thisJobCount# )
</cfoutput>
</cfloop>
It’s worth noting that the structKeyExists() call is critical here since the possibility exists that there may be 0 active jobs for one or more states that were separately returned from the states table in qStates. For example, if “TX” exists in the states table, but there are no active jobs in tbjJobPostings for state “TX”, then there will not be a record in the qJobCountsByState query for “TX”. That means a reference to stcJobCountsByState["TX"] would return an error indicating that the specified key does not exist. The structKeyExists() method call addresses this issue.
With this new method of deriving by-state job counts in place, I returned once again to the web browser to see how my work was paying off in the performance department. I clicked the refresh button in my web browser and scrolled to the debugging output.
Execution time: 2,120 ms
Alright, well now we’re getting somewhere. Typical human factors studies of web users would indicate that 2 seconds is reasonable when it comes to perceived wait times for a given page in a web application. However, I knew I could do better than this. I knew that the list of all states wasn’t likely to change very often (if ever), and that the job postings on the site were only updated once a night by a scheduled import process. This means that during the business day when the site is being access by the end users, the data being utilized to render this page isn’t changing. While I’m typically an advocate of always writing your own caching methods in ColdFusion for better control and management, this seemed like an ideal opportunity to take the easier path and simply cache the database queries on this page using the cachedWithin argument of the <cfquery> tag. So, I returned to the code and altered it by adding a 1 hour cache directive to the 3 remaining <cfquery> tags.
With this change in place, ColdFusion will handle all the logic associated with caching and refreshing the cached data from these queries every hour. One request per hour will run the queries against the database, but any other requests in that same hour will be fetching the cached query object from ColdFusion’s in-memory cache. Once again, I returned to the web browser, clicked my refresh button, and scrolled to the debugging output.
Execution time: 24 ms
Bingo! The page rendered almost faster than I could release the browser refresh button. I could see from the debugging output that the page was now referencing the 3 queries from the cache. Granted, once an hour a single request will actually have to run them against the database, but even then the execution time will only take about 2 seconds.
So, from an original page that started off with an average execution time of 24 seconds (24,000 ms), I had worked through a series of optimizations and come to rest on a new average execution time of 24 ms. In the span of roughly 30 minutes, I had created a 1000x increase in the performance throughput of this commonly requested area of the application.
If you have an area of your existing web application that seems to be suffering from some performance challenges, I’d encourage you to have another look and see what you can do about it. Review your SQL statement performance, optimize your database indexes, tweak your application logic, and apply caching where you can afford to do so. If you don’t feel comfortable doing it on your own or simply would prefer to have someone else do the work for you, then pickup the phone, call Webapper, and put our expertise in this area to work for you. The result just might be a 1000x increase in performance.

Thanks for the blog entry!
I wonder how much performance improvement you can get with just cachedWithin.
Comment by Henry Ho — August 4, 2009 @ 1:25 pm
Why not do a left join on the states table in the query where you retrieve the number of jobs per state? Unless you need the state query somewhere else in the page, there’s no need to do 2 queries where only 1 is needed. While a left join is potentially expensive, properly indexed it shouldn’t be that much more than what it is doing already.
Great article. It’s code like this that makes me think that code reviews are important. I saw a lot of the same methodology you described (looping over a query and running another query for each row) in the application I work on when I started. Needless to say, that code is long gone
Comment by James — August 4, 2009 @ 2:41 pm
Great post. Everything you say is so true.
Developers need a little bit of the dba in them or a tame one to hand.
Table indexing, writing the right query for the job and reducing the number of visits to the database are essential to every application .
Btw you should be able to fulfill this page with one query.
Comment by Stephen Moretti — August 4, 2009 @ 3:19 pm
Henry, I suspect that if the original code were left as is, no additional indexes were created, and the only change was to add cachedWithin to all the CFQuery tags in this example, the immediate performance boost would have had a similar result in that each cached query would require 0 milliseconds to run.
However, the problem would be that everytime the cached queries expire it would take 20-45 seconds to refresh before caching them again. During that 20-45 seconds while the query caches are refreshed, all incoming requests will also take 20-45 seconds until the cache is refreshed because each request will be hitting queries that have been expired but not yet refreshed. Essentially, that’s a stop the world type of event which should be avoided.
With Tyson’s changes, when its time to refresh the query caches, its a 2 second event, which itself is a significant improvement and will likely go unnoticed during the refresh interval.
Comment by Steven Erat — August 4, 2009 @ 3:27 pm
@James: You’re absolutely right. I could eliminate some logic here by performing a left join and at least returning a record indicating the proper count of 0 in the query object. Because of how I planned to convert the data into a struct, however, I had decided against that approach. I simply didn’t want more keys in the struct than absolutely necessary and was willing to take the hit of the structKeyExists() calls to determinr if a value was present for each state. I guess I was sort of taking that approach with a thought towards other situations where a similar pattern might apply. Sure, in this case we’re dealing with 50 states as a baseline. But, what if we were dealing with thousands of categories instead of 50 states? The combination of the left join and the loop to convert to a struct might then start to add up negatively. Nevertheless, excellent comment – thanks for bringing it up. I think it’s ultimately one of those sorts of decisions that each developer has to make subjective to their own specific situation, data, application, and needs.
Comment by Tyson Vanek — August 4, 2009 @ 6:11 pm
@Stephen: Thanks for the kudos. Yes, a little DBA knowledge in the hands of a ColdFusion developer can go a long way toward avoiding these sorts of application design landmines. And you’re right – it’s entirely possible to have returned all the necessary data with a single database query. But, I was also working within tight time constraints at the request of our customer. Considering this took all of 30 minutes and wasn’t a drastic overhaul of their code, it was a happy middle ground.
Comment by Tyson Vanek — August 4, 2009 @ 6:14 pm
@James: By the way, I’m glad to hear that my article inspired you to do some housekeeping of your own application. That was precisely the larger point of the article. =)
Comment by Tyson Vanek — August 4, 2009 @ 6:15 pm
@Henry: Steven has absolutely hit the nail on the head. While the cachedWithin may have been enough to place a bandage over the problem, I also wanted to ensure that the live hits to the page were snappy when the cache expires. The client may very well change their process in the future to include more frequent updates to the job postings. That decision in turn may alter their allowable cache time for these queries. I wouldn’t want them to be limited in the future based on the caching strategy and load time alone.
Comment by Tyson Vanek — August 4, 2009 @ 6:18 pm
Good stuff, I use a lot of those techniques, but i never would use cachedWithin
as long as your app is the only client of the database, I would rather persist the
query in the application (or controller for us lucky framework types) scope
and simply purge/update the cache when any update/insert occurs
I must say all the techniques / approaches are actually just about cleaning
up sloppy lazy code which is equally applicable in any programming language
Comment by zac spitzer — August 5, 2009 @ 4:18 am
@zac: Yeah, I typically avoid the use of ColdFusion’s built-in caching capabilities as well since they’re considerably difficult to manage. I have a number of my own caching solutions I would normally use instead. But, as I mentioned, I was against a bit of a time crunch to make things happen, so I went with the quicker solution of cachedWithin. But you’re absolutely right – caching to shared scope is something I typically try to do instead in situations like this.
Comment by Tyson Vanek — August 5, 2009 @ 9:55 am
Just wanted to send a Thank You for this blog entry.
I write software about 10% of my job and fine tuning details evade me often.
While my code doesn’t need to run very fast (10 users, internal use only), it was amazing to take one of my really complex query pages drop from a rather dismal 21.344 seconds to a lightning fast 3.094 seconds (with just a couple of hours of code writing).
Thanks again!
Comment by Russell — December 9, 2010 @ 7:42 pm
I know this is an old blog post, but it’s one of the most useful things I’ve read in a long time. I’m one of those people that taught themselves CFML/SQL because I could and needed to but am really not a tech. The webstuff I do is free on my own time (not my job, anymore). I do a lot of stuff that is a lot more manual — kind of like using 7 steps of clunky general math to address something that you could do in a single more elegant solution with algebra or geometry, in part because as a single mom with a crazy work schedule, I just don’t have time to spend 3 days trying to figure out what I know I can get to in an hour if I just muck about with it long enough with ‘logic’. Alas, I now have a database on a website that is huge and it’s killing me, not just because the tables probably aren’t ideal at all, but because I know the code probably sucks (duct tape and thumb tacks, I call it). Your example here is just the kind of thing I need to learn/ better understand in order to clean some of this up and restore decent functionality to some of my hobby sites. I really appreciate your taking the time.
PJ
Comment by PJ — September 30, 2011 @ 10:06 pm