What is a Database?In short, a database is a structural way of housing information. Most databases these days have a separation between presentation and storage. This means that how you display your data and what pieces of the data you display (e.g. the fields you display, the order in which you display the data) does not have to mirror the way the data is stored.
There are several flavors of databases - such as flat file databases, relational databases, hierarchical databases etc. The most common kind of database used for structured applications is the relational database which is partitioned into tables. These tables are then partitioned into rows and columns and tables can be related to each other to create views (or virtual tables that are comprised of fields from various related tables).
In this article, we will focus our attention on relational databases since those are the ones commonly needed in applications.
How do you judge a Database?A database system can be measured by several facets. We'll take a look at 4 common databases in use and see how they measure up. The main facets of a database management system that we think are important are listed below
- Will it run on my hardware or commonly available hardware?
- Speed - this is actually a multifaceted question because some databases are optimized for certain kinds of functions and will do well on those and not so well on others?
- How does it scale - will it degrade a lot when more users are added or more records are added?
- How complicated will it be to maintain - e.g. Backup, make structural changes to, changes to data and do I have the staff to support it?
- How feature reach is it? E.g. How extensive is its query language (most commonly SQL), does it have procedural language support
- How expensive is the database?
- Is it stable? E.g. does it crash often under heavy load or unexpected data?
- Desktop or Server Architecture - some databases are Desktop-based databases meaning the client computer does all the processing involved in querying, inserting records, updating and validation and the database is simply a file accessed with specialty software. Then there is the Server-based database. The Server based database is not just a file, but a whole Server component that listens for client requests and does the job of validating a client request and updating, querying the database. A server-based database tends to be faster and scales better to more users (if installed on an adequate server and programming utilizes the strengths of client-server) for several reasons. The data is local to it so it only needs to pass the requested amount of data to the client. Also if you have an intensive query the speed of the client is not as much of an issue since the server will simply pass the final result to the client.
Sampling of DatabasesWe’ll look at 4 kinds of databases - Microsoft Access, SQL Server, MySQL, PostgreSQL. Each fits a different market and has its strengths and weaknesses.
Microsoft Access 2000-2007Microsoft Access is well suited for applications that will be used by all windows users, applications used by fewer than 10-15 simultaneous users, and where the largest table in the database has fewer than (150,000 - 500,0000 records; Note this number varies depending on how wide each row is and the datatypes it holds for example we have successfully created access databases that have over 1 million records of payment data and still behave speedly for updating and rolling up this data and being accessed by 10 people simultaneously. It is also an all-in-one database in that it comes with its own form designers and programming environment which can be utilized.
|Hardware||It runs on Windows and Macintosh workstations with Microsoft Access or Microsoft Access run-time installed.|
|Speed||Depends on the processor speed and memory of your client pcs, the number of users connected at any one time to the database (it tends to degrade after about 10 or so users), and Local Area Network speed (LAN).|
|Robustness||You should never run an Access database unless you have at least 10 Mbps speed to the client and your network connection is reliable (meaning it doesn't disconnect often). Since MS Access is a client-side database, it is very prone to corruption if network speed to clients is lack-luster or if users each have different versions of the JET/Access drivers installed.|
|Maintenance||Relatively simple to maintain for savvy computer users. Backup is a matter of just backing up the .mdb file. Making structural changes is relatively easy to do and just requires Microsoft Access.|
|Scale||In general an Access database application degrades faster than other databases we will discuss - the more users you add or the more records you add to a database. It starts becoming unusable at around 20 users or 500000 records. Note this depends primarily on processor speed of clients and network speed.|
|Features||It is fairly SQL compliant and has some additional features you will not find native in other databases. PIVOT tables for example are a very powerful feature and exists in all versions of Microsoft Access. In fact the PIVOT feature in MS Access is much more powerful than what you find in SQL Server 2005. The query designer that comes with MS Access shields the user from much of the complexity of SQL, but one can go into SQL view to directly change the SQL. Functions defined in Access VBA modules can be used in queries (if MS Access is used as the front-end to the database - they will not work from web apps or in development environments such as Visual Basic, Delphi etc.). It also supports foreign keys and cascade update and delete, but lacks triggers.|
|Price||Access run-time is free if you have a developer license, but limits what the user can do. Access can also be used as a back-end database to other apps such as apps written in Visual Basic, Delphi or C++ etc or to web apps. In this case no Access run-time is needed, just the adodb or jet drivers. These drivers commonly already exist on Windows machines. Many organizations already have Access installed if they have Microsoft Office Professional and higher installed on client pcs|
|Architecture||Access is by design a Desktop database. If used in a Web application though - it behaves as a server database because the web server is the only client so controls all the access to the database. In a web environment it doesn't scale well past 20 simultaneous connections, but this is simultaneous so actually can support quite a bit of traffic. Also note that if you are running a web server such as IIS you do not need to install Microsoft Access or Office on the server (for MS Access database format below 2007). The drivers needed to update the database are already installed or can be installed by installing the free Microsoft Data Access Objects (MDAC) or ODBC drivers. The other interesting thing about Microsoft Access is that if you have the full product, you can use it to access other databases and use the forms, queries etc. as front-ends to other database such as Microsoft SQL Server, PostgreSQL, MySQL etc. It makes a great RAD environment when you want to whip out a form against pretty much any database platform or you want to write both simple and complex reports against any kind of database.|
Microsoft SQL Server 2000/MSDE Microsoft SQL Server 2005/SQL Server 2005 ExpressMicrosoft SQL Server is a server-based database. It runs exclusively on Windows servers/workstations so is best suited for Windows environments.
|Hardware||It runs on Windows machines. The client computer can be in theory anything. For Desktop applications, you will generally need a windows computer to access SQL Server, for web applications it is less important, but runs best when a windows server is querying the database.|
|Speed||Depends on the processor speed and memory of your Windows server, the number of users connected at any one time to the database and complexity of your queries.|
|Maintenance||The non-free version comes with a Database Maintenance plan wizard that makes scheduled indexing, backup, and fixing minor errors a snap. Also 3rd party products such as Veritas, CA ArcServer come with SQL Server backup agent add-ons. Comes with built-in replication. Making structural changes to tables is relatively easy to do, but be careful when making changes on tables used in views. Views that use SELECT * will need to be rebuilt otherwise you get weird behaviors like column names shifting from corresponding data, and the system doesn't warn you. Also if you drop columns views dependent on these tables will break and will need to be rebuilt.|
|Scale||Generally scales well the more users and records you add. There are additional costs associated with adding more users unless you go with the per processor pricing scheme.|
|Features||It is fairly SQL compliant. The SQL 2005 version introduced PIVOT, UNPIVOT predicates used for creating cross-tab queries or converting cross-tab queries back to row form and support for programming stored procedures and User Defined functions (UDFs) in .NET languages. An additional feature that SQL 2005 introduced is the CROSS APPLY predicate which allows you to do fairly sophisticated joins against functions that return tables and take their arguments from table field values. SQL Server 2000 lacks creation of user-defined types that you will find in PostGreSQL and Oracle. The SQL 2000 version only supports creation of immutable user-defined functions (UDF) (a UDF with same arguments can not return a different value from execution to execution - e.g. can't be dependent on varying elements such as time, but this is changed in SQL 2005 so that you can write such functions).|
|Price||The MSDE version and SQL Server 2005 Express versions are free but doen't come with Enterprise Manager or SQL Server 2005 Mangement Studio and is limited to 5 simultaneous connections and 2 GB size database. The new MSDE replacement - SQL 2005 Express have fewer limitations - e.g. 4 GB database size limitation, 1GB addressable RAM, no simultaneous connection limitation and has a stripped down database administration tool called SQL Server 2005 Management Studio Express. It is however limited to using only one processor (although it can be installed on a server with more than 1 processor) and will lacks an SQL Agent for scheduling jobs such as backup etc. SQL 2005 Express is free and can be embedded into applications free of charge as well. For standard SQL Server - expect to pay $2000 and up depending on the number of users you have. |
Another interesting thing to note about SQL 2005 Express is that it can be referenced by the database device file names and dynamically added to an SQLExpress Engine and pretty much all the functionality in terms of the SQL syntax,functions, and ability to create Comman language runtime (CLR) procedures and functions you have available in the various SQL Server 2005 non-free versions you have in the express, so it makes express a suitable version for replicating with a master SQL Server 2005 version. This is similar to the way SQLite works. All Express and MSDE run in 32-bit mode while SQL Server 2005 can support 64-bit.
|Architecture||SQL Server is a server database. Much of the database logic can be changed within the database using database triggers, user-defined functions, stored procedures without having to release changes to the client.|
MySQLMySQL is a server-based database. It runs both on Windows as well as Unix/Linux environments. It is most well-known as a web database, but can be used in non-web environments as well.
|Hardware||Can run on Windows, Unix, Linux. Very easy to install.|
|Speed||Depends on the processor speed and memory of your server, the number of users connected at any one time to the database and complexity of your queries. Compared to several other databases, its querying speeds tend to be better, and its updating speeds are on par or below many other server-based databases. This is changing with newer versions.|
|Maintenance||Comes with command-line utilies for backup and restore that can be used with scheduling agents for automated backup. Lots of very good free and non-free products on the market for administering the databases. Making structural changes is fairly easy to do with these tools.|
|Scale||Generally scales well the more users and records you add. Usually not used for databases above 100GB in size, but 1TB databases do exist and work well. Benchmarks have shown it does better than Microsoft SQL Server and on par with Oracle, but this varies based on database drivers used, number of connections, OS, etc.|
|Features||It is fairly SQL compliant, but is weak in many features compared to Microsoft Access, PostGreSQL and SQL Server. Versions prior to 4.0 completely lack subqueries and views. Versions 4.1 and above, below 5.0 have subqueries, but still lack such things as views, check constraints, foreign key constraints (except InnoDB version). Version 5.0 introduced views and greater support for stored procedures. Versions prior to 4 completely lack stored procedures, stored functions UDFs (although you can bind external libraries to the database) and triggers. Mysql 4.1 has introduced GIS spatial types for doing basic Geospatial queries, but this is very limited compared to PostGIS add-on for PostGreSQL.|
|Price||MySQL is an open-source database. However it is not a necessarily a free DBMS. It is available via an Open Source (GPL) license or a commercial license. Commercial licenses start at around $595 per server. If you are using it for commercial projects, then you need to purchase the commercial license. For internal apps there is a grey-zone as to when you need the commercial license and when you don't.|
|Architecture||MySQL is a server database. Because of the lack of stored procedures, stored functions, constraints etc.in lower versions, much of the logic has to be encoded in applications that use it.|
PostgreSQLPostgreSQL is a server-based database. It is primarily a Unix/Linux database, but version 8 and above have native MS Windows version complete with the PostGIS spatial extender.
|Hardware||Can run on Windows, Unix, Linux and Mac OSX. Fairly easy to install on par with MySQL installation. Note since PostGreSQL is so feature rich, there is a lot that can be configured to improve performance.|
|Speed||Depends on the processor speed and memory of your server, the number of users connected at any one time to the database and complexity of your queries. Compared to several other databases, its querying speeds tend to be better, and its updating speeds are on par or better than many commercial server-based databases. Lacks out of the box replication, but available thru 3rd party tools some of which are also free (e.g. Slony, Mammoth PostgreSQL Replicator)|
|Maintenance||Comes with command-line utilies for backup and restore that can be used with scheduling agents for backup. Making structural changes to tables is not quite as easy as MySQL, SQL Server and Access. Structural changes may require rebuilding views. The system will warn of this and not allow you to change until you drop a view etc. As of version 8, tables can not be dropped if dependent objects such as views rely on them. This is both a good and a bad thing. Also CREATE OR REPLACE VIEW command doesn't work if you are adding or deleting fields from a view. Such changes require dropping the view as compared with SQL Server ALTER VIEW that allows adding of fields to views without dropping the view. The version 8 and above comes packaged with its own agent scheduler called PgAgent that can be used to schedule and execute stored procedures as well as OS batch jobs such as backing up of the databases.|
|Scale||Generally scales well the more users and records you add. Usually not used for databases above 100GB, but 1TB databases do exist and have been reported to work well. Benchmarks have shown it does better than Microsoft SQL Server 2000 and below and on par with Oracle, and update speeds better than most versions of MySQL. This varies based on database drivers used, OS, etc. Database size is unlimited and 1 TB databases are not unheard of.|
|Features||It is very SQL compliant, and has advanced features such as triggers, transactions, subqueries and correlated subqueries (both in IN and in FROM clauses), stored procedures/functions (UDFs) (both immutable and volatile - NOTE MS SQL Server 2000 only supports static stored functions SQL Server 2005 does support non-static stored functions (these are functions that can rely on things such as current time)), foreign key constraints, cascade update/delete, user-defined data types, inherited tables, and OGC compliant advanced GIS functions (via PostGIS). PostGreSQL also supports stored procedures and functions in additional languages such as Perl, PHP, .NET, Pl/R languages. These however are 3rd-party features requiring additional installs and many may not be production quality. |
PostGreSQL is a case-sensitive database both in column names and data which can be a bit annoying if you are used to case insensitivity found in products like SQL Server, MySQL, and Access. For example column names are case sensitive and if you create columns where column names are not all lower case, you will need to quote them. So it is best to just keep all your fields lower case. Data is case sensitive too - e.g.
Sorting on these fields may not yield what users expect since it does a case sensitve sort.
However you can get around some of this by using regular expressions, UPPER, LOWER compare or the ILIKE predicate with loss in index efficiency or using the new citext datatype. E.g.
|Price||PostGreSQL is a free and open-source database. It is licensed under the BSD license which allows it to be packaged in commercial products without additional charge or restriction. There are commercial versions from companies such as CommandPrompt, dbExperts, and SRA which have additional features and support. EnterpriseDb is a relatively new PostGreSQL company whose namesake product tauts Oracle compatibility and Oracle features.|
PostgreSQL is a server database with a plethora of features that you would find in commercial databases such as MS SQL Server, Oracle, DB2, and Sybase. The new version 8.2 also sports an advanced data recovery system which will allow recovering a database at point in time and ability to span a databases on multiple disks.
The one feature that makes PostgreSQL unique among databases I have worked with is that it supports numerous procedural languages (Perl, PerlU, PLR, PLPHP, SQL, PgSQL) . Even Oracle can't say that since it only supports PL and Java. Microsoft SQL Server 2005 comes close by supporting CLR .NET functions, but the pains you have to go thru to write a simple .NET function in Microsoft SQL Server 2005 and install it is a lot compared to the simplicity of writing such functions in PostgreSQL with your procedural language flavor of choice.