The Future of the Database
24 Sep
New Groups of Users
Within the last few years, relational database management systems
(RDBMS), such as Oracle and Sybase, has seen a whole new class of
users. Up until recently, an RDBMS was primarily a tool for the big
corporations. They’d use it to manage their payroll, for production
management, for financial management, etc. In short, mission-critical
applications where data integrity and security is far more important
than flexibility or cost. Joe Nerd had no interest in using a database
for his own stuff. There’s no way that he could ever create enough
data all by himself, to warrant the huge cost and pain of installing
and maintaining Oracle.
Enter the web, and the picture changes. Now, Joe Nerd wants to
run a web site. Any interesting web site includes a way for
the users of the web site to collaborate. Joe Nerd may own and run it,
but the users of the web site share their comments, talk to each
other, and in general contribute to the contents of the web site. To
do that, you need to store your data somewhere and be able to retrieve
them dynamically, as a browser requests a web page.
Databases are extremely well suited to this type of
application. They take care of all the concurrency issues and
does transaction management. Since only the RDBMS touches the files,
the risk of corrupting the data files are low. And the query optimizer
will make your data retrieval life easy. In short, you can’t really
build an interesting web site without having a database sitting behind
it.
This changes the user base for databases. The corporate MIS department
developer gets company from Joe Nerd in the programmer user space. In
the end user space, the user of the final application, the corporate
staff gets company from Jane websurfer. As happened with the PC and
with the Internet, when a technology is exposed to a new group of
users, radical changes happen. But what changes?
Open Source Databases
The new users will be the type of people disposed for
contributing to an open source project. These people haven’t
really had any use for databases before, but they do now. Since open
source programmers primarily program for their own needs, they’ll
start hacking on relation database management systems. There is
already one RDBMS that’s open source from the ground up, namely <a
href=”http://www.postgresql.org/”>PostgreSQL. And <a
href=”http://www.interbase.com/”>InterBase, previously a
secret-source project, has had its source opened, although not under
GPL. The secret-source
database vendors, such as Oracle and Sybase, will begin to feel the
pressure from open source competition within the next few years.
New Demands
But the new types of users also has different demands than the
old ones did. The applications used in banks don’t change
much over time: Their primary concern is data integrity and security
over flexibility. The programmers working for banks are used to
old-fashioned development tools. The users of banking applications are
used to arcane interaction experiences. And banks are wealthy enough
that they don’t mind having to spend a lot of resources on maintaining
a database server. Needless to say, this is not the case for Joe Nerd
and his web application.
Flexible Data Models
On the web, the name of the game is change. A web application
is constantly evolving in response to user needs. Often
times, you even want the users of your web site to be able to define
and evolve their own data models, a’ la a web-based MS Access
application.
Current RDBMS software like Oracle and Sybase don’t support this
scenario very well. They are designed for static data
models. They’re very narrowminded as to what types of changes
you’re allowed to make. More importantly, changing the data model
isn’t transactional, so you can’t group two or three changes together
and make sure that either all or none of them are executed. This is
very critical, as there may be a programming error between the second
and the third change, so only the two first are executed. These first
changes have to somehow be undone, or we’ll have them lying useless
around forever.
With current RDBMS software, the developer can chose to bite the
bullet, do the data model changes very carefully, and be
prepared to manually untangle the mess that will ineviably arise from
time to time, when the changes break. Or he can chose to re-implement
part of what the RDBMS was supposed to do on top of the RDBMS, which
will be slower, less powerful and more work than having the RDBMS do
its job.
The market will demand flexible data models. The
RDBMS will have to acommodate it by making data definition more
flexible and, most importantly, transactional. We need it, if we are
to deliver high-quality applications to the end users, and it makes a
lot more sense to put it into the database layer, rather than to have
every application invent its own ad-hoc solution on top.
Modern Programming Environment
I’ve worked a lot with Oracle lately, and it’s just soooo
seventies. Oracle’s proprietary PL/SQL language lack modern
language features and doesn’t support compound data types. And
the development environment, i.e. getting your code into the RDBMS and
maintaining it once its in there, is very cumbersome.
Developers coming from a background in hacking Perl, Tcl, Java using
modern development tools will have a hard time getting used to
that. They won’t understand why they can’t write their code in
Perl, Tcl, Java or C, just because it runs inside the database
server. And indeed, with PostgreSQL, you can actually write
your stored procedures in Tcl. And there are ways to get Java to run
inside Oracle, though the process of getting it in there and
maintaining your code in the database is even more cumbersome than for
stored procedures written in PL/SQL.
Even more, with databases being used for the web, the data
we’re storing in the database is more and more strings.
Consequently, the RDBMS will have to get better at managing and
manipulating strings. Currently, Oracle has some strange limitations,
where you have to completely change your syntax if a string might
exceed 4000 characters. And string manipulation in PL/SQL is
horrible. This will have to change.
Supporting Explorative Use
The undo function of most desktop applications is one of the most
important functions—and one of the hardest to implement on top
of an RDBMS. The reason it is so important is that it
encourages exploration by the user. The user can
learn the application by trying, confident that he can undo any
changes that he regrets.
Databases already have a wonderful tool for this in the
transaction: You can make lots of changes to the database,
but until you decide that you’re done and commit the changes, you
still have the option of undoing everything by issuing a rollback
command. But you can only undo what you’ve done so far in this
transaction. The instant you’ve committed, you can’t change your mind
in any other way than by manually and carefully reversing your
actions.
I’d like to see a RDBMS that’ll let me pick an old (but not
too old) transaction and have that undone. This will
make it easier for applications to offer a general undo function to
users, and users will be happier. To implement an undo feature in a
database application today is very hard work.
The tricky thing here is the interaction among multiple
users. What if I post a question on a discussion forum, some
other user answers my question, and then I want to undo? Well,
basically, I can’t. That’s the hard fact of life. If you say something
out loud and someone else acts on it, it’s too late to take it
back. But if noone’s yet acted on what the user just did, we might as
well let her take it back.
I’m not familiar enough with the details of database theory to judge
whether it’s possible or not, but I’d guess that it would be safe to
undo an action if no later transaction has read the data the user just
touched and made other changes after having read it. The
RDBMS knows whether that’s the case or not.
Ease of Maintenance
Joe Nerd doesn’t have spare time or money for installing and
maintaing a database installation. The RDBMS has to take on
more of the job of maintaining itself and running smoothly, so the
developer can focus on adding the features that matters to the users
of his web site. I haven’t tried installing and maintaining too many
different RDBMS’, but I do know that Oracle is quite a beast to deal
with, and that both Interbase and PostgreSQL are much better.
The current trouble falls in two parts: Instaling it and making sure
it keeps running smoothly. Installation should be completely
straight-forward, and have reasonable defaults, so you don’t
even have to change any parameters before installing it. It can surely
be done.
Databases often get slower and slower as they get bigger. The RDBMS
should be able to automatically defragment as needed, grow as
needed, and in general do a reasonable job of running
smoothly without human intervention. I don’t have first-hand
experience with this, but others have told me that Interbase does an
excellent job here—proof that it can be done.
Conclusion
In summary, I believe that the twin effect of new users
starting to use databases for new things will have a profound
effect on the database market. Within the next three years, I expect
to see open source databases have a great impact, and I expect to see
them pioneering a number of innovations in RDBMS software.

crikey, dude; you post an xhtml validator link at the bottom of the page that, when clicked, shows that you have 66 errors in your markup. i’m sure you’re extremely competent which is why this surprises me.
just an observation…
That is kinda funny. The validator link is part of my template, and it’s probably the (old) content on the page messing validation up :)