Tuesday, December 16, 2008

Lock Management System

Lock Management System

Guard at Royal Palace in Gamla Stan in central Stockholm Relational database management systems exist to support concurrent users. If you don't have people simultaneously updating information, you are probably better off with a simple Perl script, Microsoft Access, or MySQL rather than a commercial RDBMS (i.e., 100 MB of someone else's C code).

All database management systems handle concurrency problems with locks. Before an executing statement can modify some data, it must grab a lock. While this lock is held, no other simultaneously executing SQL statement can update the same data. In order to prevent another user from reading half-updated data, while this lock is held, no simultaneously executing SQL statement can even read the data.

Readers must wait for writers to finish writing. Writers must wait for readers to finish reading.

This kind of system, called pessimistic locking, is simple to implement, works great in the research lab, and can be proven correct mathematically. The only problem with this approach is that it often doesn't work in the real world of hurried developers and multiple simultaneous users. What can happen is that an admin page on an ecommerce site, for example, contains a reporting query that takes an hour to run and touches all the rows in the users and orders tables. While this query is running none of the users of the public pages can register or place orders.

With the Oracle RDBMS, readers never wait for writers and writers never wait for readers. If a SELECT starts reading at 9:01 and encounters a row that was updated (by another session) at 9:02, Oracle reaches into a rollback segment and digs up the pre-update value for the SELECT (this preserves the Isolation requirement of the ACID test). A transaction does not need to take locks unless it is modifying a table and, even then, only takes locks on the specific rows that are to be modified.

This is the kind of RDBMS locking architecture that you want for a Web site and, as of 2003, it is provided only by Oracle and Postgres.

Full-text Indexing Option

Suppose that a user says he wants to find out information on "dogs". If you had a bunch of strings in the database, you'd have to search them with a query like
select * from magazines where description like '%dogs%';
A door in Skansen in Stockholm This requires the RDBMS to read every row in the table, which is slow. Also, this won't turn up magazines whose description includes the word "dog".

A full-text indexer builds a data structure (the index) on disk so that the RDBMS no longer has to scan the entire table to find rows containing a particular word or combination of words. The software is smart enough to be able to think in terms of word stems rather than words. So "running" and "run" or "dog" and "dogs" can be interchanged in queries. Full-text indexers are also generally able to score a user-entered phrase against a database table of documents for relevance so that you can query for the most relevant matches.

Finally, the modern text search engines are very smart about how words relate. So they might deliver a document that did not contain the word "dog" but did contain "Golden Retriever". This makes services like classified ads, discussion forums, etc., much more useful to users.

Relational database management system vendors are gradually incorporating full-text indexing into their products. Sadly, there is no standard for querying using this index. Thus, if you figure out how to query Oracle Text for "rows relating to 'running' or its synonyms", the SQL syntax will not be useful for asking the same question of Microsoft SQL Server with its full-text indexing option.

No comments:

Post a Comment