Why do we build database applications?
One of the main functions of database-driven applications is to produce useful reports. We spend all that effort collecting information with the hopes that with the click of a button, we can generate reports that tell us how much our customers owe us, how much profit we are making, where are the bottle-necks in our process, who are the best customers, or what are our high-priority items.We also build applications to do ad-hoc queries for example to quickly bring up the profile of a customer or set of customers, or to find out where is an order given some quick inputs such as the customer type, entry date. These can be considered reports too.
Ironically when designing a database application, reports are the last thing on an application architect's mind and instead are relegated to the "Do that after we've got the data model" bin. We focus more on the laundry list of things we want to track instead of how we want to use this tracked information. We can't have reports until we've got data to report on. This is a bit of a tragedy because a lot of the database and application design could be simplified/optimized with a basic understanding of the reports that are expected of the system. There are many ways to store data in a database, and some approaches may provide more features and ease-of-use in certain areas, but at a cost. How you structure your application and data can make certain calculations difficult to compute or a specific report layout difficult to achieve. Knowing about the needed reports before-hand can help drive the data model and application design and save a lot of effort later on down the line.
The need for Report-Driven Design
We often hear about the terms model driven architecture, service-oriented architecture, feature driven development, and test-driven development. Why not report-driven? Reports are something that everyone knows is needed, something that an end-user can understand, and something that is not filled with arcane terminology such as objects, components, paradigms, entity relational diagrams, and UML. It is so close to the problem that we just simply take it for granted and overlook it. Historically, there was not much flexibility in reporting. The way your screen looked was pretty much what you could expect from your report with some slight shifts in presentation. When you would show a user a screen you were in effect designing the report as the application was evolving. Something happened with technology down the line and this was no longer true. The balance changed from merely collecting data to using that data more effectively to drive business change. In the minds of many users, this subtle change in trend has not caught on although most application architects are quite aware that the design of your screens and your underlying data structure does not need to look anything like the final output. However the design of your screens and data-structure does control what kind of report presentations you can easily produce.
Reporting and Business Intelligence as a
whole is a relatively new phenomenon. In the past computers were
not reporting tools. They were simply expensive typewriters that
saved you from using white-out. When you wanted to find out
information you pulled out the files from a filing cabinet. When
you wanted a summary report, you rummaged thru your files and counted
them and wrote it in a document. Then
we moved on to using the computer filing system as a virtual filing
cabinet. Then we realized now that we've got so much virtual
space and so many automated ways of producing data, we can store so much more to the point that we can't find
anything anymore so we need a better way to
organize and find data. Now that we could store more data
and were generating more data we needed to aggregate this data into
more meaningful nuggets to prevent information overload. Search
engines such as Google and AltaVista and advancements in AI and search algorithms
improved our ability to find relevant data. Now we are reaching the
next plateau - the presentation of meaningful data in a way that
makes it easy to grasp trends and abnormalities with a quick
blink of an eye. This new age is often referred to as Business Intelligence. Counting
files and manually creating reports from relevant data
is too time-consuming of an exercise and just doesn't cut it anymore.
Flexibility, Features, Ease Of Use And ComplexityThere is a constant struggle between flexibility, features, ease of use, and complexity. Before I go any further, let me define what I mean by these terms.
Flexibility - the ability of an application to be easily adapted to another use or another environment that was not predefined in the onset
Features - bells and whistles - what an application can do and was designed to do. How quickly does it speed up otherwise manual processes.
Ease of Use - how easy it is for an untrained or technically challenged user to use the system. How intuitive is the system to use.
Complexity - How difficult an application is to maintain, fix when problems arise or extend as business need changes
The challenge is that we want infinite flexibility, infinite features, a self-intuitive system that your grandmother can use and at a minimum complexity under the hood. The more feature-rich your application, the more complex it becomes or the more difficult it is to use. The more features it has generally the less flexible it becomes in certain areas (for example it won't work on your beat-up 386 pc or fit on your floppy, now you need a server farm of quad-processors), and the more flexible it is in other areas - you can control your new vcr with it because it has a usb port and your vcr has a usb port. The easier an application is to use, the less feature-rich it is (it can do only one thing for example, although it may do it really well so there is not much to learn), or the more complex the underlying structure that does automatic calculations and input to give the illusion of simplicity. The more complex it becomes, the more difficult it is to manage or extend. Complexity is a necessary evil to make certain kinds of reporting possible or simplify user input, but it makes other possibly more fundamental reports an acrobatic exercise.
A general rule of thumb is Do not make your application more feature-rich and complex than it needs to be. Now flexibility in an application is a little different. Flexibility in one area is different from flexibility in another area. Some flexibilities have greater impact on complexity, ease-of use, features and even other flexibilities than others. In fact increased flexiblity does not imply increased complexity or fewer features. Generally the more you do the more you have to undo to apply it to a different problem domain and undoing work is not a trivial matter. Another general rule of thumb: Only plan for zero cost flexibilities and let the rest just happen. What does zero cost flexibility mean - it means if you have 2 ways of doing the same thing that are equally difficult to achieve, pick the one that least confines your other choices such as choice in hardware, OS, space, speed, user knowledge etc.
How do you know how feature-rich and easy to use your application needs to be and in what areas does it need to be feature-rich or easy to use? This is where report-driven design comes into play. People report on things they care about and they hand-build reports by curling thru the data in things they really care about.
Automatic vs. Stick ShiftIn all design whether software, engineering, or achitecture there are always compromises. Perfect solutions are not viable in an imperfect world. My favorite examples of compromise is the battle between Automatic cars and Stick Shift cars or a Windows OS/Mac and FreeBSD/Unix/Linux.
Stick shift cars have been around much longer than automatic, but stick shift cars are still made and auto connosoirs still buy them. If you compare the design of the two cars, you'll notice the automatic car has a lot under the hood which is done for you, so therefore is much easier to use. It is much more complex than a stick and more feature-rich, in many ways but ironically less flexible. It has default answers to many scenarios which 90% of the time are the preferred answer. These answers can't be easily overriden which makes it less flexible. This is a feature however because you don't have to waste precious time answering the question or know how to maneuver the stick for that matter to answer the question. It is less flexible because if you are in a new environment, that 90% answer could be completely wrong, you need to be able to override it. Its focus is ease-of-use and more features at the cost of increased complexity and loss in flexibility.
The stick shift car on the other-hand has a much simpler design under the hood, it is actually much more flexible as far as steering and driving is concerned, but its trade-off is that it is harder to use for the inexperienced or distracted driver. It is more flexible in the sense that you can do things with it that you can't do with the automatic even though the stick design was not specifically designed for those things. It is less feature-rich in some respects because it won't drive for you or think for you. You really have to pay attention when you are driving a stick. The stick makes fewer assumptions about what you want to do therefore is much easier to trouble-shoot when something goes wrong. It will work better in less rich environments.
People who just want to get from place a to place b or talk on the phone while driving often prefer automatics and people who want to experience the car as an extension of their soul prefer the stick. Sometimes the same person may prefer a stick in some occasions and an automatic in another.
Same can be said with applications and operating systems. Unix has been around a lot longer than Windows and Mac OS. Windows and Mac OSes are very GUI intensive which makes them easier to use for a technically challenged or untrained person than a unix-like system, but very frustrating for a very technically skilled person because they can't pull off the hood as easily. Some things are next to impossible to do without the GUI, whereas in a Unix like system - the GUI is the after-thought. Unix was designed more for people who want to really be in control, don't want to compromise speed for convenience and who are not afraid to learn a plethora of shell commands to get results fast. Its interesting to note that Linux is begining to cater more to the general user with such interfaces as GNOME, and that the new Mac OSX Darwin system is under the hood FreeBSD.
For some users being in control of their destiny is more important than ease of use. For other users being in control of that particular part of their destiny is too distracting. Sometimes users get tired of being in control of their destiny and just want the more-or-less obvious decisions to be made for them to simplify the hassle of clicking the "Yes" button. An automatic-stick that hides the stick from the automatic preferring type user or can predict your mood would satisfy both camps, but your complexity could be much higher since your application now has to look like a stick to a stick user and an automatic to an automatic user. So when it comes to features vs. complexity vs. ease of use, look at what kind of users that will be using the application and profile their stick-auto personality ratio for this particular kind of application. Focus on why they really want to track this data and how they plan to use it.
In the next couple of examples I'll describe how Report-Driven Design can help flesh out the data model and the application flow of a system.
Report-Driven Design at Work
Dilemma 1: The battle between structured and unstructured data
One of the battles tackled over and over again when designing a data-intensive application is whether to make a particular field a pick list or a free-text field. Lets say that you have an order application which you can structure in at least one of two ways.
- You can have a system comprised of an order table and a customer table. All customers must be in the customer table before you can process an order for that customer. You either have to have your app auto-populate the customer table for a new customer - making it complex, or prompt the user that this customer is not in the system woud they like to add , also making your application more complex
- You can have a system comprised of just an order table. Simply just put in the order. The downside users may spell the same customer in different ways thus making some kinds of reports less useful or they have to repeat information that can be easily looked up in the customer profile.
If you followed the tenents of data normalization and object orientation to the letter, you would go with option 1 and you would have a customer id field which is a number or short-string of some sort that is the primary key of your customer table and a foreign key in your orders table.
If you just wanted something fast, simple to use and in place fast or you were astute enough to observe that hey - they sell cars to families, they don't have too many repeat customers so an order and a customer are 99% of the time one and the same, you would go with choice 2.
If you looked at the reports that the car dealership was generating or has requested,
you may have quickly realized that you could just go with 2. First you would
look at the reports and see a balance sheet report, profit report, but you wouldn't
see a repeat customer report or customer profile report. You would then ask the owner - "aren't you interested
in how many repeat customers you have, or how much each customer has spent with
you," and the owner would say - "we don't get many repeat customers until ten years
down the line, and then I don't really care at that point". Possibly he may
answer "Hmm hadn't thought about that. That could be interesting."
The "That could be interesting" may make you think to change your application slightly
to have customer field as a combo box where they can type in new entries or pick
from existing to copy.
Dilemma 2: Columns verses Rows
Another common dilemma is whether to store data in columns or rows. A common example of this is a survey application.
In a survey you have one of 3 common options
- store the answer to each question as a field in a table
- store the answer to each question as separate rows with possibly (or not necessary) a parent record in another table
- store the answer to some questions as fields in one table, and Q/A to other questions in a linked table
Report 1 - single row per survey, each column being an answer to a question or average of an answer or fancy arrangement of the questions.
Report 2 - rollup report - separate row for each question, but each row gives total number of people that responded yes to that and percentage.
Report 3 - survey profile - separate row for each question, answer to each question shows on each row. Demographic info about survey participant shows at top.
If you had a simple survey that was only comprised of 5 or fewer questions or the customer insisted on a report like Report 1, storing each question as a separate field in 1 table would probably be the easiest to achieve. In fact you would be able to achieve Report 1, Report 2, and Report 3 with equal difficulty. The more questions you have in the survey, the more time-consuming your reports would be to create with standard report writers and screen design tools.
If you had a survey with lots of questions and you opted for 2 or 3, then Report 1 may be tricky in that it requires writing cross-tab queries, but Report 2 and 3 would be relatively easy since they take advantage of the general pattern of (Question --> Answer or Question --> Avg(Answer)) for each row. For demographic information, or questions that may need to be listed in Report 1 but not others, you may store these as fields in your parent table since they wouldn't follow the pattern of the other questions, so setting up your system using model 3, may provide the best balance.
The Rise of Business Intelligence
As stated earlier in this article, we are seeing a fundamental shift in technology. The shift from generating and finding data to using that data in very creative ways to pinpoint trends, patterns and bottle-necks. How we input our data no longer reflects how we output that data. With this new disconnect between input and output, there is now more than ever a need to pay attention to how your application and data architecture affects your output options. At the end of the day the output is the most important thing.