Tuesday, December 16, 2008

Brave New World

Brave New World

Old Volvo.  Sodermalm.  Stockholm, Sweden Training an African Grey parrot to function as an information systems manager can be very rewarding. The key sentence is "We're pro-actively leveraging our object-oriented client/server database to target customer service during reengineering." In the 1980s db world, the applicable portion of this sentence was "client/server" (see next chapter). In the Brave New World of database management systems, the key phrase is "object-oriented."

Object systems contribute to software reliability and compactness by allowing programmers to factor their code into chunks that are used as widely as possible. For example, suppose that you are building a catalog Web site to sell magazines, videos, books, and CDs. It might be worth thinking about the data and functions that are common to all of these and encapsulating them in a product class. At the product level, you'd define characteristics such as product_id, short_name, and description. Then you'd define a magazine subclass that inherited all the behavior of product and added things like issues_per_year.

Programmers using modern computer languages like Smalltalk and Lisp have been doing this since the mid-1970s, but the idea has only recently caught on in the RDBMS world. Here are some table definitions for the Illustra system, a derivative of the U.C. Berkeley Postgres research project:


create table products of new type product_t
(
product_id integer not null primary key,
short_name text not null,
description text
);
Then we define new types and tables that inherit from products...
create table magazines of new type magazine_t (
issues integer not null,
foreign_postage decimal(7,2),
canadian_postage decimal(7,2)
)
under products;
create table videos of new type video_t (
length_in_minutes integer
)
under products;
Having defined our data model, we can load some data.
* insert into magazines (product_id,short_name,description,issues)
values (0,'Dissentary','The result of merging Dissent and Commentary',12);
* insert into videos (product_id,short_name,description,length_in_minutes)
values (1,'Sense and Sensibility','Chicks dig it',110);
* select * from products;
---------------------------------------------------
|product_id |short_name |description |
---------------------------------------------------
|1 |Sense and Sensibility|Chicks dig it|
|0 |Dissentary |The result o*|
---------------------------------------------------
Suppose that our pricing model is that magazines cost $1.50/issue and videos cost 25 cents a minute. We want to hide these decisions from programs using the data.

create function find_price(product_t) returns numeric with (late)
as
return 5.50;
So a generic product will cost $5.50.
create function find_price(magazine_t) returns numeric
as
return $1.issues * 1.50;
create function find_price(video_t) returns numeric
as
return $1.length_in_minutes * 0.25;
The appropriate version of the function find_price will be invoked depending on the type of the row.
* select short_name, find_price(products) from products;
---------------------------------------
|short_name |find_price |
---------------------------------------
|Sense and Sensibility| 27.50|
|Dissentary | 18.00|
---------------------------------------
This doesn't sound so impressive, but suppose you also wanted a function to prepare a special order code by concatenating product_id, price, and the first five characters of the title.
create function order_code(product_t) returns text
as
return $1.product_id::text ||
'--' ||
trim(leading from find_price($1)::text) ||
'--' ||
substring($1.short_name from 1 for 5);
* select order_code(products) from products;
-----------------
|order_code |
-----------------
|1--27.50--Sense|
|0--18.00--Disse|
-----------------
This function, though trivial, is already plenty ugly. The fact that the find_price function dispatches according to the type of its argument allows a single order_code to be used for all products.

This Brave New World sounds great in DBMS vendor brochures, but the actual experience isn't always wonderful. Back in 1995, we were using Illustra and built ourselves a beautiful table hierarchy more or less as described above. Six months later we needed to add a column to the products table. E.F. Codd understood back in 1970 that data models had to grow as business needs change. But the Illustra folks were so excited by their object extensions that they forgot. The system couldn't add a column to a table with dependent subclasses. What should we do, we asked the support folks? "Dump the data out of all of your tables, drop all of them, rebuild them with the added column, then load all of your data back into your tables."

Uh, thanks...

Braver New World

Millesgarden. Stockholm, Sweden If you really want to be on the cutting edge, you can use a bona fide object database, like ObjectStore (http://www.objectstore.net). These systems persistently store the sorts of object and pointer structures that you create in a Smalltalk, Common Lisp, C++, or Java program. Chasing pointers and certain kinds of transactions can be 10 to 100 times faster than in a relational database. If you believed everything in the object database vendors' literature, then you'd be surprised that Larry Ellison still has $100 bills to fling to peasants as he roars overhead in his Gulftream jet. The relational database management system should have been crushed long ago under the weight of this superior technology, introduced with tremendous hype in the early 1980s.

After 20 years, the market for object database management systems is about $100 million a year, less than 1 percent the size of the relational database market. Why the fizzle? Object databases bring back some of the bad features of 1960s pre-relational database management systems. The programmer has to know a lot about the details of data storage. If you know the identities of the objects you're interested in, then the query is fast and simple. But it turns out that most database users don't care about object identities; they care about object attributes. Relational databases tend to be faster and better at producing aggregations based on attributes. Millesgarden. Stockholm, Sweden

Keep in mind that if object databases ever did become popular, it would make using Java or C# as a page scripting language much more attractive. Currently the fancy type systems of advanced computer languages aren't very useful for Internet application development. The only things that can be stored persistently, i.e., from page load to page load, in an RDBMS are numbers, dates, and strings. In theory it is possible to write a complex Java program that does an SQL query, creates a huge collection of interlinked objects, serves a page to the user, then exits. But why would you? That collection of interlinked objects must be thrown out as soon as the page is served and then built up again the next time that user requests a page, which could be 1 second later or 2 days later. If on the other hand the objects created by a C# or Java program were efficiently made persistent it would make applying these big hammers to Web scripting a lot less ridiculous.

No comments:

Post a Comment