Performance
There are several ways to achieve high performance. One is to buy a huge multi-processor computer with enough RAM to hold the entire data model at once. Unfortunately, unless you are using PostgreSQL, your RDBMS vendor will probably give your bank account a reaming that it will not soon forget. The license fee will be four times as much for a four-CPU machine as for a one-CPU machine. Thus it might be best to try to get hold of the fastest possible single-CPU computer.
If you are processing a lot of transactions, all those CPUs bristling with RAM won't help you. Your bottleneck will be disk spindle contention. The solution to this is to chant "Oh what a friend I have in Seagate." Disks are slow. Very slow. Literally almost one million times slower than the computer. Therefore the computer spends a lot of time waiting for the disk(s). You can speed up SQL SELECTs simply by buying so much RAM that the entire database is in memory. However, the Durability requirement in the ACID test for transactions means that some record of a transaction will have to be written to a medium that won't be erased in the event of a power failure. If a disk can only do 100 seeks a second and you only have one disk, your RDBMS is going to be hard pressed to do more than about 100 updates a second.
The first thing you should do is mirror all of your disks. If you don't have the entire database in RAM, this speeds up SELECTs because the disk controller can read from whichever disk is closer to the desired track. The opposite effect can be achieved if you use "RAID level 5" where data is striped across multiple disks. Then the RDBMS has to wait for five disks to seek before it can cough up a few rows. Straight mirroring, or "RAID level 1", is what you want.
The next decision that you must make is "How many disks?" The Oracle9i DBA Handbook (Loney 2001; Oracle Press) recommends a 7x2 disk configuration as a minimum compromise for a machine doing nothing but database service. Their solutions start at 9x2 disks and go up to 22x2. The idea is to keep files that might be written in parallel on separate disks so that one can do 2200 seeks/second instead of 100.
Here's Kevin Loney's 17-disk (mirrored X2) solution for avoiding spindle contention:
Now that you have lots of disks, you finally have to be very thoughtful about how you lay your data out across them. "Enterprise" relational database management systems force you to think about where your data files should go. On a computer with one disk, this is merely annoying and keeps you from doing development; you'd probably get similar performance with a zero-administration RDBMS like PostgreSQL. But the flexibility is there in enterprise databases because you know which of your data areas tend to be accessed simultaneously and the computer doesn't. So if you do have a proper database server with a rack of disk drives, an intelligent manual layout can improve performance fivefold.
Disk Contents 1 Oracle software 2 SYSTEM tablespace 3 RBS tablespace (roll-back segment in case a transaction goes badly) 4 DATA tablespace 5 INDEXES tablespace (changing data requires changing indices; this allows those changes to proceed in parallel) 6 TEMP tablespace 7 TOOLS tablespace 8 Online Redo log 1, Control file 1 (these would be separated on a 22-disk machine) 9 Online Redo log 2, Control file 2 10 Online Redo log 3, Control file 3 11 Application Software 12 RBS_2 13 DATA_2 (tables that tend to be grabbed in parallel with those in DATA) 14 INDEXES_2 15 TEMP_USER 16 Archived redo log destination disk 17 Export dump file destination disk
[Another area in which the commercial databases can be much faster than PostgreSQL is dealing with 50 simultaneous transactions from 50 different users. In a naive RDBMS implementation these would be lined up in order, a log of transaction 1 would be written to disk and then the database user who requested it would be notified of success, transaction 2 would then be processed and, after another disk write, be declared a success. Oracle, however, is smart enough to say "about 50 transactions came in at nearly the same moment so I'll write one big huge block of info to disk and then get back to all 50 users informing them of the success of their transaction." Thus performance in theory could 50 times better with Oracle (1 disk write) than with PostgreSQL (50 disk writes).]
No comments:
Post a Comment