Back to Papers and Articles

Report Driven Design


Copyright Paragon Corporation   ( May 31, 2006)

Why do we build database applications?

One of the main functions of database-driven applications is to produce useful reports. We spend all that effort collecting information with the hopes that with the click of a button, we can generate reports that tell us how much our customers owe us, how much profit we are making, where are the bottle-necks in our process, who are the best customers, or what are our high-priority items.

We also build applications to do ad-hoc queries for example to quickly bring up the profile of a customer or set of customers, or to find out where is an order given some quick inputs such as the customer type, entry date. These can be considered reports too.

Ironically when designing a database application, reports are the last thing on an application architect's mind and instead are relegated to the "Do that after we've got the data model" bin. We focus more on the laundry list of things we want to track instead of how we want to use this tracked information. We can't have reports until we've got data to report on. This is a bit of a tragedy because a lot of the database and application design could be simplified/optimized with a basic understanding of the reports that are expected of the system. There are many ways to store data in a database, and some approaches may provide more features and ease-of-use in certain areas, but at a cost. How you structure your application and data can make certain calculations difficult to compute or a specific report layout difficult to achieve.   Knowing about the needed reports before-hand can help drive the data model and application design and save a lot of effort later on down the line.  

The need for Report-Driven Design

We often hear about the terms model driven architecture, service-oriented architecture, feature driven development, and test-driven development. Why not report-driven? Reports are something that everyone knows is needed, something that an end-user can understand, and something that is not filled with arcane terminology such as objects, components, paradigms, entity relational diagrams,  and UML.  It is so close to the problem that we just simply take it for granted and overlook it.   Historically, there was not much flexibility in reporting.  The way your screen looked was pretty much what you could expect from your report with some slight shifts in presentation.  When you would show a user a screen you were in effect designing the report as the application was evolving.    Something happened with technology down the line and this was no longer true.   The balance changed from merely collecting data to using that data more effectively to drive business change.   In the minds of many users, this subtle change in trend has not caught on although most application architects are quite aware that the design of your screens and your underlying data structure does not need to look anything like the final output.  However the design of your screens and data-structure does control what kind of report presentations you can easily produce.

Reporting and Business Intelligence as a whole is a relatively new phenomenon.  In the past computers were not reporting tools.  They were simply expensive typewriters that saved you from using white-out.  When you wanted to find out information you pulled out the files from a filing cabinet.  When you wanted a summary report, you rummaged thru your files and counted them and wrote it in a document. Then we moved on to using the computer filing system as a virtual filing cabinet. Then we realized now that we've got so much virtual space and so many automated ways of producing data, we can store so much more to the point that we can't find anything anymore so we need a better way to organize and find data.   Now that we could store more data and were generating more data we needed to aggregate this data into more meaningful nuggets to prevent information overload.   Search engines such as Google and AltaVista and advancements in AI and search algorithms improved our ability to find relevant data. Now we are reaching the next plateau - the presentation of meaningful data in a way that makes it easy to grasp trends and  abnormalities with a quick blink of an eye.    This new age is often referred to as Business Intelligence. Counting files and manually creating reports from relevant data is too time-consuming of an exercise and just doesn't cut it anymore.

Flexibility, Features, Ease Of Use And Complexity

There is a constant struggle between flexibility, features, ease of use, and complexity.   Before I go any further, let me define what I mean by these terms.

Flexibility - the ability of an application to be easily adapted to another use or another environment  that was not predefined in the onset
Features - bells and whistles - what an application can do and was designed to do.  How quickly does it speed up otherwise manual processes.
Ease of Use - how easy it is for an untrained or technically challenged user to use the system.  How intuitive is the system to use.
Complexity - How difficult an application is to maintain, fix when problems arise or extend as business need changes

The challenge is that we want infinite flexibility, infinite features, a self-intuitive system that your grandmother can use  and at a minimum complexity under the hood.  The more feature-rich your application, the more complex it becomes or the more difficult it is to use.  The more features it has generally the less flexible it becomes in certain areas (for example it won't work on your beat-up 386 pc or fit on your floppy, now you need a server farm of quad-processors), and the more flexible it is in other areas - you can control your new vcr with it because it has a usb port and your vcr has a usb port.  The easier an application is to use, the less feature-rich it is (it can do only one thing for example, although it may do it really well so there is not much to learn), or the more complex the underlying structure that does automatic calculations and input to give the illusion of simplicity.    The more complex it becomes, the more difficult it is to manage or extend.  Complexity is a necessary evil to make certain kinds of reporting possible or simplify user input, but it makes other possibly more fundamental reports an acrobatic exercise.     

A general rule of thumb is Do not make your application more feature-rich and complex than it needs to be.  Now flexibility in an application is a little different.  Flexibility in one area is different from flexibility in another area.  Some flexibilities have greater impact on complexity, ease-of use, features and even other flexibilities than others.  In fact increased flexiblity does not imply increased complexity or fewer features.  Generally the more you do the more you have to undo to apply it to a different problem domain and undoing work is not a trivial matter.  Another general rule of thumb: Only plan for zero cost flexibilities and let the rest just happen.  What does zero cost flexibility  mean - it means if you have 2 ways of doing the same thing that are equally difficult to achieve, pick the one that least confines your other choices such as choice in hardware, OS, space, speed, user knowledge etc.

How do you know how feature-rich and easy to use your application needs to be and in what areas does it need to be feature-rich or easy to use?  This is where report-driven design comes into play.  People report on things they care about and they hand-build reports by curling thru the data in things they really care about.  

Automatic vs. Stick Shift

In all design whether software, engineering,  or achitecture there are always compromises.  Perfect solutions are not viable in an imperfect world.   My favorite examples of compromise is the battle between Automatic cars and Stick Shift cars or a Windows OS/Mac and FreeBSD/Unix/Linux.

Stick shift cars have been around much longer than automatic, but stick shift cars are still made and auto connosoirs still buy them.  If you compare the design of the two cars, you'll notice the automatic car has a lot under the hood which is done for you, so therefore is much easier to use.  It is much more complex than a stick and more feature-rich, in many ways  but ironically  less flexible.  It has default answers to many scenarios which 90% of the time are the preferred answer.  These answers can't be easily overriden which makes it less flexible.  This is a feature however because you don't have to waste precious time answering the question or know how to maneuver the stick for that matter to answer the question.  It is less flexible because if you are in a new environment, that 90% answer could be completely wrong, you need to be able to override it. Its focus is ease-of-use and more features at the cost of increased complexity and loss in flexibility.

The stick shift car on the other-hand has a much simpler design under the hood, it is actually much more flexible as far as steering and driving is concerned, but its trade-off is that it is harder to use for the inexperienced or distracted driver.  It is more flexible in the sense that you can do things with it that you can't do with the automatic even though the stick design was not specifically designed for those things. It is less feature-rich in some respects because it won't drive for you or think for you.   You really have to pay attention when you are driving a stick.   The stick makes fewer assumptions about what you want to do therefore is much easier to trouble-shoot when something goes wrong.  It will work better in less rich environments.

People who just want to get from place a to place b or talk on the phone while driving often prefer automatics and people who want to experience the car as an extension of their soul prefer the stick.  Sometimes the same person may prefer a stick in some occasions and an automatic in another.

Same can be said with applications and operating systems.   Unix has been around a lot longer than Windows and Mac OS.  Windows and Mac OSes are very GUI intensive which makes them easier to use for a technically challenged or untrained person than a unix-like system, but very frustrating for a very technically skilled person because they can't pull off the hood as easily.  Some things are next to impossible to do without the GUI, whereas in a Unix like system - the GUI is the after-thought.  Unix was designed more for people who want to really be in control, don't want to compromise speed for convenience and who are not afraid to learn a plethora of shell commands to get results fast.  Its interesting to note that Linux is begining to cater more to the general user with such interfaces as GNOME, and that the new Mac OSX Darwin system is under the hood FreeBSD.

For some users being in control of their destiny is more important than ease of use.   For other users being in control of that particular part of their destiny is too distracting.  Sometimes users get tired of being in control of their destiny and just want the more-or-less obvious decisions to be made for them to simplify the hassle of clicking the "Yes" button.  An automatic-stick that hides the stick from the automatic preferring type user or can predict your mood  would satisfy both camps, but your complexity could be much higher since your application now has to look like a stick to a stick user and an automatic to an automatic user.  So when it comes to features  vs. complexity vs. ease of use, look at what kind of users that will be using the application and profile their stick-auto personality ratio for this particular kind of application.  Focus on why they really want to track this data and how they plan to use it.

In the next couple of examples I'll describe how Report-Driven Design can help flesh out the data model and the application flow of a system.

Report-Driven Design at Work

Dilemma 1: The battle between structured and unstructured data

One of the battles tackled over and over again when designing a data-intensive application is whether to make a particular field a pick list or a free-text field.  Lets say that you have an order application which you can structure in at least one of two ways.

  1. You can have a system comprised of an order table and a customer table.  All customers must be in the customer table before you can process an order for that customer. You either have to have your app auto-populate the customer table for a new customer - making it complex, or prompt the user that this customer is not in the system woud they like to add , also making your application more complex
  2. You can have a system comprised of just an order table. Simply just put in the order.  The downside users may spell the same customer in different ways thus making some kinds of reports less useful or they have to repeat information that can be easily looked up in the customer profile.

If you followed the tenents of data normalization and object orientation to the letter, you would go with option 1 and you would have a customer id field which is a number or short-string of some sort that is the primary key of your customer table and a foreign key in your orders table.

If you just wanted something fast, simple to use and in place fast or you were astute enough to observe that hey - they sell cars to families, they don't have too many repeat customers so an order and a customer are 99% of the time one and the same, you would go with choice 2.

If you looked at the reports that the car dealership was generating or has requested, you may have quickly realized that you could just go with 2.  First you would look at the reports and see a balance sheet report, profit report, but you wouldn't see a repeat customer report or customer profile report.  You would then ask the owner - "aren't you interested in how many repeat customers you have, or how much each customer has spent with you," and the owner would say - "we don't get many repeat customers until ten years down the line, and then I don't really care at that point".  Possibly he may answer "Hmm hadn't thought about that.  That could be interesting."   The "That could be interesting" may make you think to change your application slightly to have customer field as a combo box where they can type in new entries or pick from existing to copy.

On the otherhand - you may have looked at some old reports and noticed McCars Rent Cheap is a customer that appears lots of times and is sometimes spelled McCar.  You may notice that this particular dealership has only 10 customers.  Since entering customers is a rare occurrence, but putting in orders is common, it makes sense to go with approach one, because it saves on data entry as well as minimizing on mistakes in spelling thus making your reports much more useful and at the same time making data entry easier.

Dilemma 2: Columns verses Rows

Another common dilemma  is whether to store data in columns or rows.  A common example of this is a survey application. 

In a survey you have one of 3 common options

  1. store the answer to each question as a field in a table
  2. store the answer to each question as separate rows with possibly (or not necessary) a parent record in another table
  3. store the answer to some questions as fields in one table, and Q/A to other questions in a linked table
Again if you knew what reports were expected, it would greatly help you in deciding which way to go.  

Report 1 - single row per survey, each column  being  an answer to a question or average of an answer or fancy arrangement of the questions.
Report 2 - rollup report - separate row for each question, but each row gives total number of people that  responded yes to that and percentage.
Report 3 - survey profile - separate row for each question, answer to each question shows on each row.  Demographic info about survey participant shows at top.
If you had a simple survey that was only comprised of 5 or fewer questions or the customer insisted on a report like Report 1, storing each question as a separate field in 1 table would probably be the easiest to achieve.  In fact you would be able to achieve Report 1, Report 2, and Report 3 with equal difficulty.  The more questions you have in the survey, the more time-consuming your reports would be to create with standard report writers and screen design tools.

If you had a survey with lots of questions and you opted for 2 or 3, then Report 1 may be tricky in that it requires writing cross-tab queries, but Report 2 and 3 would be relatively easy since they take advantage of the general pattern of  (Question --> Answer   or Question --> Avg(Answer)) for each row.  For demographic information, or questions that may need to be listed in Report 1 but not others, you may store these as fields in your parent table since they wouldn't follow the pattern of the other questions, so setting up your  system using model 3, may provide the best balance.

The Rise of Business Intelligence

As stated earlier in this article, we are seeing a fundamental shift in technology.  The shift from generating and finding data to using that data in very creative ways to pinpoint trends, patterns and bottle-necks.   How we input our data no longer reflects how we output that data. With this new disconnect between input and output, there is now more than ever a need to pay attention to how your application and data architecture affects your output options.  At the end of the day the output is the most important thing.


Further Reading
RDD - Report Driven Design An interesting blog entry about Report Driven Design that shares similar philosophies as we have expressed here.
Business Intelligence Wikipedia article defining what consitutes Business Intelligence and some background history of it.
Being Analog: An Interview with Donald Norman In this interview with Donald Norman, author of The Invisible Computer. In this interview he describes the danger of creeping featurism and why information appliances designed to do just one thing are much less complex, easier to use, and more transparent.
Growing Up: Moving from Technology-Centered to Human-Centered Products A chapter from The Invisible Computer
Return of the Mac Paul Graham article describing how it seems a lot of hackers are switching back to the Mac.
Too much Tech in carsInteresting commentary on how technology in cars has gone too far.



Back to Home