|Back to Papers and Articles|
Databases: Past, Present, and Future
Copyright 2003 Paragon Corporation ( June 13, 2003)
What is a database and why do we have them?In the simplest definition of the word, a database is a storage container for data. A database system is a storage container for data as well as a mechanism for accessing that data.
There are several kinds of databases we can think of - you can think of a spreadsheet as a database, a mail server, a word document, a relational server based database, desktop database such as Microsoft Access. Even a book is a database. The World Wide Web (WWW) you can think of as an intricate network of databases.
Databases have existed since the beginning of civilization and in fact define civilization, but were initially not electronic in nature. When man needed to store knowledge or keep track of information, they wrote them down, cataloged them using paper indices. So the book was the very first kind of database. These were not electronic databases, but nevertheless served the same purpose. They were used to track ledger accounts, scientific knowledge, and historical fact. Today when we think of databases, we think of electronic databases - not these hard copy items that defined civilization so many thousands of years ago.
Databases serve a couple of purposes.
The Birth of the Relational Database
When people speak of true databases -- they often refer to more structured electronic databases such as relational databases, object databases, OLAP databases, and spatial databases. These kinds of databases can trace their roots back to the 1970 when E.F. Codd from IBM introduced the idea of relational model of databases in his paper entitled "A Relational Model of Data for Large Shared Banks". Before that time most business databases were based on the network model or simple flat file structure.
The relational model
was based on mathematical set theory. It served a couple of purposes
IBM's System-R database was born from this work, but was largely ignored by IBM, and then very shortly Oracle came out with their commercial database based on E.F Codd's relational theory and research from System-R and research in Berkely Ingres. The Berkeley Ingres research project was also started at this time and strived to extend the relational model to deal with more complex data. Many object and object-relational databases can trace their roots back to Ingres. Oracle beat IBM to market with its relational database and made billions.
Other relational databases began to sprout from these early pioneers - Informix, Sybase, and Berkeley Ingres project later gave birth to the Berkeley Postgres which strived to add more Object-Oriented features to the relational model (both projects were headed by Michael Stonebraker) which then transformed into PostgreSQL. System-R gave birth to DB2.
Catalog of Various kinds of databasesThe relational model and SQL gave birth to other models which we will briefly summarize in the next couple of sections. The main ones that come to mind are Object databases, Object-Relational, Spatial Database, OLAP databases and very recently XML databases.
The object database model followed shortly after the relational database model. This model was developed to compensate for the deficiencies of the relational database model -- namely to construct and query complex datastructures without having to denature them into a relational 2-dimensional structure. Object databases use a language very similar to SQL called OSQL (object sql). This bares a striking similarity to SQL e.g.Below is an example of an OSQL statement that lists all orders for the Smith Company
The object database proved to be too complicated for most purposes and it was also (at least initially) much slower than relational databases. As a result - there are very few pure object databases in this world. Most databases provide a compromise between the relational model and the object model. Examples of object-relational databases are Oracle and Postgres. These allow you to define complex datatypes and store them in tables but still tout the standard relational primitive types.
Spatial databases were developed to correlate data in space. They provided answers to questions such as how much money have we spent within a 20 mile radius from this specific location? How far has waste product extended from the spill location? How many miles away is the closest hospital to this house?
Most spatial databases don't stand on their own, but instead are just an extension to relational databases. They use a dialect of SQL called Simple Features Specification for Structured Query Language (SFSQL) - which simply adds spatial functions to SQL - such as distance, touches, centroid, inside, area, extent.
In fact most spatial databases store spatial data in relational databases, but in specialized fields used to hold spatial data. Examples of spatial databases are Oracle Spatial (which sits on top of Oracle) , ESRI Arc SDE (which can sit on top of a Microsoft SQL Server or Oracle database), PostGIS (sits on top of PostgreSQL), DB2 Spatial Extender which adds spatial functionality to IBM DB2 databases, and even MySQL is providing limited functionality for Spatial data in its upcoming 4.1 version.
Below is an example of what a spatial query looks like. This is using the Postgis dialect of SFSQL, but the syntax is pretty similar in other spatial databases.
The above example would provide a listing of all police stations that are closest to a building called ‘MY BUILDING’.
On-line Analytical Processing Databases (OLAP)OLAP databases are geared toward analyzing data rather than updating data. They are used to drive business processes based on statistical analysis of data and what-if analysis. The main feature of OLAP databases is speed of querying and multi-dimensionality.
Most real OLAP databases allow you to slice data into an infinite number of dimensions - e.g. by time, product line, and sales groups. These databases are fed most often by relational databases.
Many OLAP databases have their own dialect of SQL specifically designed to deal with the multi-dimentionality of OLAP data. One example that comes to mind is Microsoft SQL Server's Analysis Services which uses a variant of SQL called Multi-Dimensional Expressions Language (MDX). Below is an example of what that looks like.
XML DatabasesXML Databases are the new kid on the block. They are basically a new take on the old idea of Object databases. It too has its own dialects of query languages - one called XQL and several other dialects. In an XML database, data is stored in XML blobs and manipulated using XPATH like statements. Below is an example of an XQL
Which would list a grouping of invoices with products as subgroup.
Databases - The FutureDatabases do a good job of storing data and recounting data, but still often rely on a human observer or external program to make correlations and decisions on this data.
The next frontier for the database has started, but is still in its infancy. There are a couple new exciting things happening in the database industry.
Enhanced Decision Logic in the DatabaseOne is the marrying of database storage with decision support. Basically when a database can make correlations of its own with minimal human intervention.
There is a growing trend now to provide more sophisticated programming logic within the database structure. Initially databases only controlled what kind of data could be put in fields -- e.g. if you defined a field as a DateTime type, it would cough if you tried to insert something other than a date. Then databases became more sophisticated and had features such as triggers, cascading update and delete etc. - which prevented updaters from creating inconsistencies between tables. Databases also developed a simplified procedural language that contained embedded SQL with some looping and control structures - e.g Sybase/SQL Server's transact SQL, Oracle's PL/SQL, PostgreSQL pgsql.
Now databases are beginning to take on even more complex logic. Oracle 10i touts the ability to write database procedures in Java, the new Microsoft SQL Server 2005 will support calling .NET assemblies and classes in the database. PostgreSQL already supports sophisticated languages such as Perl, Python, and PL/R for writing database functions.
Data Any Place (Distributed Databases)
Another feature of databases that is being expanded is the concept of separation of location from the abstract concept of the database itself which Codd defined long ago. Now we see this in full-force where you have the concept of a database - that lives in more than one location, but can be queried as a continguous unit. These are called distributed or federated databases. You can have a portion of a database in New York and another in Boston and run a query to count all the customers which would then query both locations. This is made possible because network speed is increasing.
More Complex Data - The Star Trek GenerationThe data we store is becoming increasingly more complex as discussed earlier we now store spatial data with it’s own need for specialized functions. We store image data, scanned in data, complex medical data such as gene sequences. We now have devices such as medical devices that record physical data and transform it to a digital format that can be processed by computers. We have begun to process this data to derive patterns - e.g. the DNA sequences that will cause a particular disease.
Perhaps not in the near future but possbily in the distant future, medicine and databases will become advanced enough that we can store digital replicas of ourself. The idea of the Star Trek transporter transporting from one location to another may then become a simple matter of radio waves transporting digital data that defines the essense of what we are to a distant planet.
Back to Papers and Articles