The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SQL Reporting Services for complex logic reports

My company is in the midst of looking at replacing our dog-slow, web based report system with SQL Server Reporting Services.  We have a demo up but the problem is that the existing reports, the previous developer did basically all the summing and everything in the code, including doing things like concatenating multiple invoice numbers to a single order. 

Basically, he would pull everything from the database, and do a series of "do while" loops through the recordset (this is Classic ASP, so its all a mess of VBScript) to sum up the totals of various columns or check what price column to use in certain cases (different types of customers, and the table itself contains both price columns for some unknown reason), and display the raw results directly in the code ( no using variables, just straight Response.Write rs("column_name") ).  According to our users, the reports are accurate but take upwards of 20 minutes to run, if they don't time out.  I've tried to improve the execution time but to no avail, the server-side code is just too large to improve without rewriting the whole page from scratch, and the previous developer's code is nearly impossible to decipher due to using cryptic variable names that look like they were stripped of any vowels, and no indentation whatsoever.

The problem is that there doesn't seem to be any way to do this kind of logic in SSRS, and thus our demo reports are being rejected as being inaccurate, or not having the same functionality as the old reports.  I've tried to do it in SQL code through aggregates, but this doesn't work as the order total, for example, will be ridiculously high when I use a SUM in SQL, but will be right in the old guy's code via looping through a recordset hundreds of times and adding the value to itself.  There also doesn't appear to be any way at all to group together related invoice numbers (which are strings/varchar) so they appear on a single line.

Worse still, in some aspects he displays the actual order details... and in some bout of madness decided to store it as a comma-separated list in a text field, so his code splits it and then loops through the results to build up the details and sum some totals.  He even uses a counter variable to know what position of the detail he's at.

Without diving into some really hairy SQL, is there any way that we can get similar functionality in SSRS to the old mess?  I'm at the point where I'm tempted to just re-write the reports in ASP.NET so they'll run faster, but SSRS seems like a much nicer choice and gives us some more features to use... it's just that the old reports were constructed entirely on the server-side and not in the database, and SSRS appears to work the opposite way.
WayneM. Send private email
Thursday, October 09, 2008
 
 
You can get SSRS to do the counting/sums/aggregation for you by creating groups and then aggregating in the header or footer. So that might work for part of your problem.

For the issue where you get back a comma delimited list and need to create report sections, you might be out of luck without some major work. You can certainly split up the list using VB code within the report, but you don't really have the ability to dynamically create new report sections. You'd have to create a single field and then use VB to format the data as desired and stick it in that single field I think. Kinda ugly.

If you are willing to considering rewriting the report in ASP.NET, why not just rewrite the SQL queries instead to return data in a more appropriate form and keep using a SSRS report?

Thursday, October 09, 2008
 
 
This may not be what you want to hear.

In the past, I've had terrible luck with trying to produce valuable reports from an OLTP database using simple reporting tools.  If you are given data stored for operational purposes, and you need data used for analytical purposes,  you have to transform it, big time.  I think that's where you're runnning into grief.

Where I have gotten excellent results is in producing "reporting databases" that store the data in a different form, one that is suitable for hundreds of different reports.  Star schema design as used in data warehousing has served well for this purpose. 

Writing the process to extract the data from the main database, transform it to the reporting database format, and load it into the reporting database has been a challenge, but one worth meeting.  There are tools that help with this,  but it isn't easy even with a tool.

Once those two things are done,  using just about any reporting tool to produce reports that are relevant, accurate and useful is fairly quick and easy. The same is true for extracts,  table links to desktop tools like MS Access,  or interactive analytic tools, like Business Objects.

You may have a hard time convincing your clients to pay for the initial up fron development of a reporting database.  And you may face a tough learning curve before you know how to build a solid, relevant star schema.
But this really does work.
Walter Mitty Send private email
Thursday, October 09, 2008
 
 
"If you are willing to considering rewriting the report in ASP.NET, why not just rewrite the SQL queries instead to return data in a more appropriate form and keep using a SSRS report?"

---

This is what I as trying to do, but the results are inconsistent with the current solution because I can't do the same kinds of aggregation and grouping with SQL like the current implementation does via VBScript.  I thought I had deciphered enough of the current code to figure out what it was doing, but I must have missed something because the results are not the same.  For example, he (the previous developer) looped through a recordset to calculate the summed total; I tried to do this with a SQL SUM statement, but got a different amount.
WayneM. Send private email
Thursday, October 09, 2008
 
 
Well if you're on Sql 2005, you might consider CLR functions.  You can write functions in your .NET language of choice and install them on Sql Server and call them from your SQL like other SQL functions.

With CLR functions I believe you can write your own aggregate functions, which you can't do with normal SQL Server functions.
Ted
Thursday, October 09, 2008
 
 
The main point of issue is how the current system concatenates certain parts into one line, and this doesn't seem possible in SQL since the columns in question are text (in the current system they are actually hyperlinks to extra details).

For example, this in the database:

order_id    invoice_no    total
12345        AB0043        10.00
12345        BE3430        20.00

would be this in the current system:

order_id    invoice_no          total
12345        AB0043, BE3430      30.00

But this doesn't seem possible in raw SQL.
WayneM. Send private email
Thursday, October 09, 2008
 
 
@WayneM:
You can do that in easily in SQL. Just do a pivot, then a case/isnull to concatenate the invoice numbers.

Have you considered using CLR stored procedures to re-create the non-trivial queries? They will allow you to have much greater flexibility, compared to vanilla SQL.

I get what Walter is saying about extracting OLTP data to a warehouse, but this is not always necessary, and sometimes inappropriate, depending upon your specific requirements.
Scorpio Dragon Send private email
Thursday, October 09, 2008
 
 
What I'd do is forget trying to do raw reporting off of naff data (i.e. comma separated stuff). You seem to have a good handle on what you want in the report and the data you are starting with, so break it down into multiple steps.

You need an intermediate step that will massage the data into a usable form - then write the reports off of that. SSIS could be used to do the intermediate step.. I haven't had a good dig around SSIS but the older version in Sql Server 7 was DTS and it was easy to do the sort of thing you are talking about (you could use VB script to split the strings etc and dump into a new table).

Haven't really worked with reporting services either but if you could link an SSIS package to the report (i.e. run the SSIS package that will do the intermediate step and then run the reports). Either that or just diasy chain them on a schedule.
Anon
Thursday, October 09, 2008
 
 
If your new results don't match the old results, then at least one of you is wrong. I'm sure that most of us have found cases where an old favourite report was actually incorrect. Please don't assume that you're wrong just because you have different results.
Ron Porter Send private email
Thursday, October 09, 2008
 
 
Truth be told I don't know which of us is wrong, but to the end user the new version is wrong because the old version was right (even if it wasn't) and it's what they know.
WayneM. Send private email
Thursday, October 09, 2008
 
 
I think the issue is because the SQL code is not consolidating multiple invoices as one order line, but displaying it as multiple lines... then when everything is summed at the end, these orders are being counted 2+ times instead of only once (the total amount is for ALL of them, but they are stored as separate rows in the table).

I have a hunch if I solve that issue, then I will solve the other problems, although one report is off by a few thousand dollars for unknown reasons.
WayneM. Send private email
Thursday, October 09, 2008
 
 
I'm doing a massive amount of work in SRSS and it really is a very capable product.  In a few cases, we did find that the reports that I've been creating product a different result than the old reports and the old reports were found to be wrong.  Sure there are lots more cases where my reports were wrong but those were all simple fixes.

As for concatenation, I've had good luck with this SQL construct:

select
order_id,
sum(total) as total,
rtrim((SELECT invoice_no+' ' as "text()" FROM invoice where  invoice.order_id = order.order_id for xml path(''))) as Invoices
from order
where...

You can make the invoice numbers separated by commas instead of spaces (as it is here) and use substring to lop off the last comma.
Almost H. Anonymous Send private email
Thursday, October 09, 2008
 
 
Probably a bad suggestion - but I would always recommend SAS (http://www.sas.com) or SAS-like tools to do this kind of work, as the ETL part will become much simpler and intuitive, along with the capability to publish HTML reports. Of course, you need to convince management to purchase the necessary licenses.
Anindya Mozumdar Send private email
Thursday, October 09, 2008
 
 
I get what Scorpio is saying about extracting to a warehouse (or to a reporting datbase) being sometimes unnecessary or inappropriate.  When I said that "I have gotten excellent results", I meant just that.  I wasn't trying to suggest a universal best practice. Still, my experience is one useful example.

Meanwhile, I'm confused by the rest of the thread as it emerges.

On the one hand, the difference between a report that shows separate invoices on separate lines and one that shows invoices separated by commas strikes me as cosmetic.  Cosmetic issues may be important to the people who use the reports, but they need to be understood as cosmetic.

On the other hand, the question about whether existing reports,  or the proposed new reports are "wrong" seems anything but cosmetic.  I would think it would be necessary to debug the new reports, or perhaps demonstrate bugs in the old reports,  by tracing back to the data.  I would not let the subjectivity of report users prevail over the facts. 

I didn't see a comparison between two reports that deliver different results.  Maybe it's in the discussion, but I missed it.  The question of whether or not a new report is doing double counting when aggregating is one that ought to be straightforward to answer.
Walter Mitty Send private email
Friday, October 10, 2008
 
 
Somehow, this whole discussion is scaring me like very few things can. This is NOT the time to worry about SSRS or ASP.NET. Instead, the original poster needs to find out more about the logic of the report itself, and DOCUMENT IT.
Raj Chaudhuri Send private email
Friday, October 10, 2008
 
 
I would run through the old logic with a tooth comb, find out *exactly* what is happenning, and dry run that against known source data. I would then verify the results with business users, showing them both source data and report, and check whether that is correct. Then, and only then, would I try to figure out whether the old implementation was "wrong", or how to create a new implementation.

SSRS, like any reporting engine, ultimately uses control-break logic. Unlike most reporting engines, it gives us the ability to hook into any part of the control-break via expressions or VB.NET functions. So it should be possible to exactly duplicate a complex report in SSRS. Whether this should be done is debatable.

But that debate can only take place after we know *exactly* what the old code is doing, or better still, what source data should produce what results. This is a job for Clark Documentor Kent, not SuperCodeOrToolMan.
Raj Chaudhuri Send private email
Friday, October 10, 2008
 
 
@Raj:
I don't agree, I think if you can show that the new report is correct, then you can ignore the old one. QED.

I would not spend ages ploughing through it to understand something that will be thrown away soon anyway. If it turns out that the old report was "wrong" all this time, then so be it. That is something the users will have to reconcile (mind the pun).

Of course, I have heard of rare cases where the users want the new report to have the same mistakes, so that it is wrong in a consistent way. Insurance triangles can be like this, although I'd always provide the correct one first and then the "broken" one as an alternate.
Scorpio Dragon Send private email
Friday, October 10, 2008
 
 
I completely understand, unfortunately trying to decipher the old code is a project unto itself, since the previous developer was highly cryptic in his variable names, usually using what looks like a string of random characters, re-using variable names for different things (so even if the name was readable, it might not be accurate anymore), and using little or no indentation at all.

Not to mention the fact I have my users breathing down my neck expecting it to be a trivial task, when it's really not and requires a fair bit of detective work to figure out the nuances of the old code (I basically have to re-write it using good variable names, just to understand what the hell it's doing) before I can even attempt to tackle the new reports.

Looks like I had better get back to work, then ;-)
WayneM. Send private email
Friday, October 10, 2008
 
 
"the new version is wrong because the old version was right (even if it wasn't)"

This is actually one of my favourite things. I might have to eat a bit of crow if I produced the original, but some domain expert will be dining with me :)

The reason I like finding stuff like that is because it really gets everybody thinking hard about what is *really* supposed to be happening. A bit of controversy over the correctness of results generally ratchets everybody's domain knowledge up a notch.

Go figure out exactly what the old report is doing. Once you know how inputs are transformed to outputs, you'll be in a better position to determine what you need to do to duplicate the transformation. You'll also be in a better position to at least discuss the correctness of the transformation.
Ron Porter Send private email
Friday, October 10, 2008
 
 
Dealing with the report that has always been wrong, but enjoys high credibility among users is an issue that will rear its ugly head no matter what methodology you adopt. 

There's one exception to this general rule: if you decide to make your new reports bug-for-bug compatible with the old reports.

A classic example of bug-for-bug compatibility is the date serial number in MS Excel. 

In the calendar most of us use, the Gregorian calendar,  the day after February 28, 1900 was March 1, 1900.  But, if use MS Excel, you will find that it intercalates February 29, 1900 between these two dates.  As a consequence, MS Excel delivers the wrong day of the week for dates 01/01/1900 through 02/28/1900. 

In this regard, MS Excel 2007 is bug-for-bug compatible with the earliest version of MS Excel.  And, in turn, it's bug-for-bug compatible with Lotus 1-2-3 in this regard.

Excel is not, however, bug-for-bug compatible with MS Access in this regard. MS Access  correctly omits Ferbruary 29, 1900.

You should be prepared to prove, at least to your own satisfaction, that the new reports are correct.  Then, if the old reports disagree, you should be prepared to write them off as wrong, at least to the degree that they differ from what's correct. 

You may have to deal with people's preference for the old reports as a psychological issue, instead of as a logical issue.
Walter Mitty Send private email
Friday, October 10, 2008
 
 
"Basically, he would pull everything from the database"

Everything?

It can't even query on an index value or a select criterion? What kind of database engine doesn't allow that at least?
Rowland
Sunday, October 12, 2008
 
 
It's not that it couldn't, it's that the original developer *didn't*, and chose to do all aggregates and whatnot in code.  Think along the lines of:

mysql = "SLECT * FROM orders where idorder = '" & request("idorder") & "' order by orddate"

call getfromdatabase(mysql, rst)

do while not rst.eof
ttl = ttl + rst("ordttl")
shpttl = shpttl + rst("shipchrg")
othttl = othttl + rst("othchrg")
ordnotes = rst("ordnotes")

ntes = split(ordnotes, ",")
c = ubound(notes)

for i = 0 to c
  ' ??? something concatenating the order notes
next

mysql = "select * from customers where idcustomer '" & rst("idcust") & "'"
call getfromdatabase(mysql, rstmp)

response.write "<table>"
' html output
response.write "<td>" & rstmp("custnam") & "</td>
loop

That pretty much sums up how the code is; no indentation, reusing variables on the fly (no Option Explicit), variable names that are pretty much the same thing with minor differences or totally meaningless.

I attempted to bring the summing logic out of the code and into the database, but like I said earlier it doesn't seem to be giving the same results (whether the original version is doing the right thing or not I have no idea, since trying to read it is like trying to decipher hieroglyphics) and I'm running into trouble trying to format it in the same way.

Sunday, October 12, 2008
 
 
"It's not that it couldn't, it's that the original developer *didn't*, and chose to do all aggregates and whatnot in code..."

Holy poo that style of code won't scale worth a damn. I won't insult anyone's intelligence by explaining why not. I'm sure we can all see why not.

You probably don't need to migrate to a whole new platform to solve this. The main problem isn't the platform, the main problem is the code.

There's a lot of risk in overhauling or rewriting a sort-of-working system whose only real problem is it doesn't scale. My approach would be to find what looks like it would be the biggest resource suck in the code and spot refactor it. Repeat as necessary. Test carefully as you go along and have backups handy in case you make a boo boo.

One great thing about refactoring a technically correct system is you gain a detailed understanding of its intent as you go along. Whereas if you try to rewrite from scratch it might diverge functionally in some subtle but fatal way.

Another great thing is you make the effort and reap the benefits in installments. Every iteration of spot refactoring makes the system a less slow than it was before. When it's tolerable you can call it quits and leave the rest of it well enough alone.
Rowland
Monday, October 13, 2008
 
 
The main issue is that there is so much logic done in the code, that it would make the SQL really hairy to strip it out.  For example, the code checks values that were returned by SQL, and performs additional calculations based on that.  e.g. if the customer is from a specific company, then sum up these two values, add in the sum of another value, and apply a certain markup.  In order to get the same functionality I would have to make some user defined functions specific to these reports in order to do this aggregating instead of code like this:

case when company = 'abc' then sum(quantity * isnull(cost, 0.00) + isnull(user1, isnull(user2, 0.00)) + case when customertype = 1 then sum(qty*price1) * .15 else sum(qty*price2) * .15 end end

(I'm not even sure if that SQL is valid, since typing it out made me go cross-eyed, but you get the idea.  To show how the original developer thought, the fields "user1" and "user2" are designed to let end users override shipping.  I don't know why there are two fields instead of just one, but the current code checks both to see if there's a value in either.)

However, I want to avoid doing specific functions because there is no real common ground between reports, so each report would basically have half a dozen functions that only it uses, and no other.
WayneM. Send private email
Monday, October 13, 2008
 
 
Also, I don't think I can even abstract out that hairy code into variables, because it needs to be evaluated for every row returned by the query.
WayneM. Send private email
Monday, October 13, 2008
 
 
If the report is this complicated, I would recommend that initially at least, you continue running this as a custom report, perhaps re-written using ASP.NET.

It is possible to do reports of this sort in Reporting Services. One possibility is to use custom VB.NET functions. You may call these anywhere in a report section. The code can be sensitive to the context: for instance, in group headers or footers of a table layout, you may use built-in aggregate functions whose scope is just the group.

Using Reporting Services will mean that you have to learn the SSRS framework to a great deal of depth, and working within it's parameters. What it will get you is a report that can be (among other things) generated in a scheduled manner, in a veriety of formats including PDF.

You may decide to do this eventually. But like I said earlier, the logic of the report itself is more important than the technology. If you end up re-writing or refactoring the report as code one time, I'm sure you will learn a lot about the report itself, and then will be able to take a much more informed decision regarding what technology to use and how.

Not to mention the fact that the next guy will probably bless you for properly documenting the report(s) this time around. :-)
Raj Chaudhuri Send private email
Monday, October 13, 2008
 
 
"The main issue is that there is so much logic done in the code, that it would make the SQL really hairy to strip it out."

Trying to move all that logic from program to SQL wouldn't be the first thing I'd tackle. First come up with a WHERE clause that weans out some of the unneeded rows. See what that buys you in speed. If that doesn't do it here's the next step: get rid of that marshaling of the whole rowset into RAM in the getfromdatabase function, replace it with logic that fetches each row in a loop and then processes that row completely before fetching the next row.

You can even do the function of a join in code by nesting two different row fetching loops on two different queries. That's not a good substitute for doing a proper JOIN in SQL but it might work as an intermediate step in your refactoring.
Rowland
Tuesday, October 14, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz