FoxProFoxPro Developer's Conference 94 |
Session 211
Applications
with
Large Databases
Val Matison
Matison Consulting Group Inc.
Dealing with the very large
This session will focus on issues such as performance, maintenance, and practicality as they apply to very large systems developed with FoxPro. You cannot be complacent about even the most fundamental issues when working with huge data sets. Small mistakes which are often overlooked in simple applications become magnified to render a large system useless. With proper planning however, and a thorough understanding of the application and its intended use, a robust system can be built to rival even mainframe based systems. This applies not only to the application but to the actual storage and management of data as well.
These notes are meant to complement the session where many of the concepts will be discussed in detail with examples.
Introduction
Large data sets
You may be asking yourself how does the management of very large volumes of data apply to me? The amount of data you manage today will be vastly different from tomorrows requirements. The downsizing / rightsizing revolution is here and brings with it volumes of legacy data that must be accessed, managed and archived. The data that resided on yesterdays big iron now sits on the desktop, stored in a few hard drives each the size of a bar of soap.
We are blessed / cursed to be part of the information age. We have the ability to access voluminous on-line information systems, link into corporate databases, or to create our own gigantic data sets. As more information becomes available, methods will be needed to efficiently manage it. New data types are affecting applications. Information is no longer restricted to character or numeric format. Still images, audio and video information can now be effectively stored and retrieved and are increasingly becoming part of normal application development.
Todays high end hardware systems are extraordinarily powerful and have a very high return on investment. Gigabyte hard drives are now in the $600-$700 range and computer memory is cheap. If hardware is inexpensive and information readily available, then the two will surely meet.
Unfortunately the state of software has not matched the growth rate of hardware, so its incumbent upon the application designer to create fast, stable, and flexible systems. The informed programmer / analyst can effectively use FoxPro to manage the huge volumes of information which were once inaccessible.
Who uses all this data? The types of applications that weve dealt with involve mailing lists, transaction logs, and storage of binary data. The latter type of system can store transactions from other applications, image or sound information.
FoxPros Suitability to the Task
Questions may exist as to whether other database systems should be used instead of FoxPro. Features such as transaction tracking, rollback functionality, concurrency control, and security are better implemented in other systems. This topic is under investigation by the author and the results of testing will be published when available. Increased functionality comes with a s price. Some of the problems that exist with a more complicated DBMS are support issues at the clients site, additional concerns about network protocols, and the hardware engine that will drive the application. A competent FoxPro developer can use FoxPros strengths to develop very high end systems that rival even client server systems.
Size restrictions
There are two types of size restrictions that affect the FoxPro developer. These are the total number of records in a table and the physical size of the table. The latter restriction will typically be the limiting factor in any application. FoxPro will always limit the size of a table to two gigabytes regardless of the operating system. This means that you cannot take your application and run it under an operating system that supports files larger than two gigabytes. This file size limitation refers to the DBF. The memo file can also grow to the two gigabyte limit as can the index file. This means than one logical table can consist of individual files that surpass four gigabytes.
FoxPro has a limit of one billion records per table. This doesnt make much sense as the table would be limited to a single field one byte wide. Only 256 of these records would be unique. A practical limit is about 15 to 20 million rows for average record lengths of about 100 to 150 bytes.
The rest of this paper will discuss large databases that consist of tables of more than one million records or individual files greater more than one gigabyte. We will discuss several areas of concern including the issues of performance, basic hardware issues that can negatively impact performance, data integrity, maintenance procedures for pack and sort operations, backup and restore procedures, and techniques to create logical volumes greater than two gigabytes. The techniques used are taken from real world applications developed by the author. In some respects this material is esoteric in nature and will apply to specific applications. It is assumed that the reader understands basic Rushmore optimization and SQL.
Most of our discussions will revolve around the use of Novell Netware. The majority of systems utilizing large data sets like these are typically stored on local area networks and Novell is the current industry standard network operating system. We are currently testing Windows NT 3.5 for optimal configurations.
Our testing is on going and if the reader has any particular questions not addressed in this paper, please forward your request to the author. We will endeavor to investigate these questions and to share the results with the FoxPro community.
Integrity
The Causes of Data Corruption
FoxPro is a fairly stable product and corrupted tables are infrequent but they can occur, often without your knowledge. It is even possible to have a perfectly well behaved application with corrupted data. My experience shows that corrupted data is usually the result of incompatible software drivers, faulty hardware, or apathetic end users. Only the first two in the previous list are in your control.
Compatible versions of software drivers are particularly important in a network environment and complete testing is necessary to insure that you have a stable system as problems are often not readily apparent. There are drivers for the network, the interface card, Windows, the workstation hard drive, network hard drive, tape backup, and so on. Creating a stable configuration can break down into a hit and miss adventure since its rare that one manufacturer knows all of the combinations of other equipment that interfaces with their own. The installation of a new version of one component can often lead to problems with all of the other systems so a holistic approach is recommended.
Faulty hardware is usually easier to spot and repair although marginal equipment can be difficult to diagnose. Network concentrators and interface cards can usually cause incorrect packets to be transmitted or received. My advice with respect to drivers and hardware is to stick with name brand equipment if only to get better technical support. Cabling is of paramount importance. It is imperative that you use the correct type of cables for your network. Cabling within your PC and the file server is also important. SCSI cables are often the source of unknown drop out errors including the disappearance of entire network volumes.
End users often have an odd idea of how to exit an application gracefully. The power switch is often equivalent to a quit menu option. This can cause problems if a reindex is in progress or buffered data has not been written to disk.
Methods of Error Detection
There is an important difference between invalid and corrupt data. In the first case, validation reflects business rules and must be managed within the application. Even if the business rules have been written correctly, and validation enforced, corrupt data can get into our tables. If a character is stored in a numeric or date field then that field is corrupted.
A table may have three files associated with it including the data file, a compound index, and possibly a memo file which by default have file extensions of DBF, CDX and FPT respectively. Each file is made up of a header area that describes the file structure and data that makes up the balance. An error that will cause immediate problems will usually reside in the header. Information in the header may have the proper format yet still cause problems with the application. The FoxPro error message "Not a database file" is often caused by a record count placed in the header which is greater than the number of records in the physical table. Corrupted information that resides within the data area may go completely unnoticed, even if calculations are performed on that data.
To determine whether invalid information has been written to a DBF we must use low level file functions to check for correct data. Each data type is stored in a particular manner and it is possible to isolate some problems by validating each component on a byte by byte basis. A date field for example is physically stored as YYYYMMDD, with each byte being numeric. If any of the bytes are character, then this field has corrupt information.
In the event that invalid information has been written to our table, we must have the means to resolve these problems. Each DBF and FPT file has information about the structure of that file embedded in the header. It is possible to use the low level file open commands to parse the header and the data to determine if the header information matches the data and if the data is of the correct type.
Some data types are more difficult to check for accuracy such as character and memo fields. Checksums can be used to ensure that data transfers are accurate. This includes procedures such as reading a record for display purposes. When transferring data from workstation to the file server, a checksum should accompany the data, and when retrieved checked against the data to ensure integrity.
Routines of this nature need to be executed as quickly as possible and should be written in C and linked with FoxPro via a PLB or FLL.
Data Storage
If your applications consist of small table sizes, you probably have not been too concerned about the location of your files. In a network environment the application data is usually stored on a single network volume. Each users local hard drive is used for temporary files created during queries or index creation and so on. There are performance and fault tolerance concerns with large files and both are dependent on the way the data is stored. The physical location of a single file may span multiple drives that appear as one logical device.
It can be beneficial to review the structure of the data that is stored in a record. It may be possible to implement a compression scheme to save storage space.
Data compression
We use data compression to minimize storage requirements. There is a trade off in performance in read and write operations and there can be problems when the data is removed from the controlling application. Data compression does not refer to schemes such as dblspace or stacker. In fact we avoid these compression schemes where possible due to the performance penalties in using these technologies. This discussion does not include third party schemes such as lha or pkzip either.
There are benefits of using compression particularly when storing binary data. Often, you know the structure of the data that makes up each individual block and therefore know ahead of time what the ideal compression scheme should be. This permits the creation of an optimized in-house compression method. The benefit of using your own data compression scheme is that you have control over it. This scheme can be changed as the data changes and can be turned on or off at will. You are not dependent on the whims of operating system vendors or third party publishers.
The neutralizing aspect of compression is that data must be decompressed and the resultant overhead may create a negative impact upon the application. Of course this has to be weighed against the benefits of storing more information within the same amount of storage area. If an effective compression ratio of say 3:1 can be obtained, the performance benefits obtained by moving smaller amounts of data may offset the penalty of decompressing the data at the workstation. Its up to you to determine the ratio of compression that outweighs the extra decoding involved.
Physical partitions
If your tables can fit on one hard drive, then many of your storage problems may be solved. Single drives of nine gigabytes are available so its entirely possible to store all of your data in one place. The performance and maintenance penalties tend to outweigh any benefits of a single drive solution.
Novell recommends partitioning hard drives into large logical volumes for peak performance. An eight gigabyte logical volume is created by using four two gigabyte drives. The size of the volumes is limited by the size and number of available hard drives and by the amount of memory within the server. This can be counter productive however. If any one of the drives breaks then the entire volume has to be rebuilt. This means software may have to be reinstalled and data restored from backup. Even with very fast tape drives, a restore of eight gigabytes can take a considerable amount of time.
Data Access
File opening routines must be robust and allow for full paths to be assigned to each individual file. Drive letters and paths should be left separate and concatenated at runtime to determine the proper path to tables. Novell permits you to map logical drive letters dynamically and this can be used to your advantage for maintenance purposes. In the event of potential problems for any given hard drive, we have the option of copying data to a different volume either from tape or the current volume and then changing the path to the volume in our file open routines. Using this logical drive technique could even allow you to restore your data to a different file server all together and then use the other file server for data access. You can also use alias names to access alternate images of your data.
File open routines must have intelligence built into them to let the user know the current status of a file. It may be accessible in which case theres no problem but in the event of an archiving process or data currently undergoing recovery, you must let the user know of the current procedure taking place. Once data is resorted into the system, the user should have the capability of reopening the tables just restored.
Large logical partitions
In order to create a logical file of more than two gigabytes, some type of segmentation must be employed to partition your data into physical sizes that are acceptable by both FoxPro and the operating system. This implies that you have a method of recombining the data so that the user perceives continuity within the set.
Segmentation
Segmentation can take place at an arbitrary physical boundary, logical boundary or vertical boundary via decomposition of the data. A physical boundary is reached once the two gigabyte ceiling is reached. A new file is created and current information is placed into the new file. The logical view of the data is merely a union of the two tables. This system could be applicable in a transaction environment where data is retained for archival purposes but rarely changed. Scientific data collection is one possible application to use such a segmentation scheme.
A logical boundary is quite often possible since most large data sets lend themselves to partitioning by groups. These could be geographic, demographic, chronological or other logical groups that exist within your data. A key benefit of this scheme is that employees are usually assigned to one of the logical groups such as a region of the country. If small groups of users are accessing the partitioned data then overall throughput of the system will be improved.
Decomposition of a table involves a vertical split of the data into multiple sub tables, leaving several key fields in one table and the remaining support information in the other table. Each record of the master table has a reference to the location and name of the table containing the rest of the data that pertains to the entire logical record. The benefit of this system is that the main table which has key values used in queries could be stored on a faster device while the support table could be stored on a separate device such as hard drive or even optical juke box. The problem with this approach is the additional overhead involved in maintaining the links between the tables.
The following diagram shows three different physical views of the same table. Normal data consists of one table where all data resides. Note that field five is binary in the form of a memo field. The second image displays data which has been partitioned into separate blocks. This method of partitioning data is the least desirable if the data is contiguous across the partition as queries may have to span boundaries. There will also be more overhead in processing the data when maintenance operations are used. However it can be effective if there are logical partitions for the data such as geographic boundaries or chronological data. Should queries require information from all partitions, individual queries would work against the smaller sets and the results unioned if needed. The last image shows data that has been vertically split. this solution is ideal where images are involved since they can be stored on low speed optical storage devices.
Maintenance
No system exists that does require periodic maintenance. Structures may need modification, data may need to be relocated to faster hardware, or information may need to be archived. A well behaved system will take maintenance issues into account to allow users to remain as productive as possible, even during times of unscheduled maintenance.
Your system should allow users to work with parts of the application even if some of the tables are unavailable for maintenance purposes like indexing or packing. Its infrequent that users will need all data at all times so disabling segments of the application is reasonable.
Its important to understand how your application affects the file components such as the memo file or index. Indexes grow when a record is added. They also grow by changing any part of the key that was used in the index expression. Many developers recycle records to avoid packing by storing blanks in each field of a deleted record. That record is then reused when a new record is added to the system. For each combination of delete and replace, the index grows. To obtain the smallest index, make sure your data is sorted in the same order as your tag expression.
It will be a requirement that the PC that actually performs the maintenance must have a sufficiently large hard drive, fast processor and lots of system ram to perform functions to peak efficiency. When creating indexes on tables with millions of records we ran into problems with some of our hard drives since they had insufficient hard drive space for the temporary files that are created during the indexing process.
The placement of the different components of your tables may affect the performance of queries. Our testing in this area is incomplete but we have noticed improvements in speed when data is located in several different drives. The graph below suggests that query times were reduced in some instances. You may also want to separate CDXs from tables and to place them on completely different drives.
Sorting your data in the way it will be most commonly accessed is also extremely important. Our tests show that a simple scan of a table can be 1500% faster if the data is scanned in natural order as opposed to index order.
Free space
There should be sufficient space on the file server for maintenance functions. Several types of procedures may require large amounts of free space such as backup and restore, indexing, sorting, preallocation of space and so on. This space could be as much as several gigabytes, and is usually dependent on your largest files. Free storage areas should also exist on the hard drive of the local machine thats involved with maintenance functions.
Pre-Allocate Space
When working with very large data sets, the data base administrator must determine when new space must be assigned for working tables. This procedure must be monitored before significant data entry is proposed. Since the size of the tables will grow, you must ensure that no other process will use allocated space on the same data drive used by your application. This method is commonly used in SQL systems such as Oracle or Sybase. Unfortunately, theres no way to automatically preallocate this space within FoxPro.
Preallocation for binary data is particularly important. If your system uses normal fielded information then the largest amount of information that can be added is limited to a single record. A different problem exists when you are using memo or general fields. A single record could consist of one 2 gigabyte block. Often there is no way to know this before data is actually added to the system.
There are several methods to ensure theres room available for data when you need it. The simplest method and the one I prefer is to deny disk access to users unless they are using the application. Unfortunately this may not always be possible so other methods must be used to actually preallocate space. Since FoxPro uses fixed length fields we can add deleted blank records to the table to actually fill up the space. It would be simple enough to develop a scheme that could find the next available free record and to use that when adding new information. Many developers already use this method to actually avoid dead space within files. The down side is that your data files are always at the maximum size so your are always taxing the system regardless of their logical size. There are also performance penalties when employing this method. You may also need to keep additional logs of actual physical record counts since the "reccoun()" function ignores deleted records.
Another method of allocating space is to add a group of files to the volume that do nothing else except take up space. As more space is needed by the application, individual files can be erased by the FoxPro application. A two gigabyte partition could be preallocated using twenty 100 meg files. This can cause problems with the network operating system since resources must be used to hash the directories, and to manage the data after deletion for salvage purposes. This becomes apparent when you try to perform a backup and the tape backup software has insufficient cache memory to perform the backup.
Using a file manager similar to that listed in the picture will allow you to achieve some level of data independence. The fields listed below represent both logical and physical information.
Backup & Restore
One of the unfortunate problems with FoxPro is that we cannot backup the data in real time while we are actually working with it. While a user has a table open, the backup software cannot get access to the file and must skip over this information. There are two alternatives that exist that permit backups to proceed and they include backing up in off hours or taking individual copies of the data set and then backing up that data.
The first alternative is the simplest to implement. At a predetermined time just activate the backup program to copy your tables to tape. Make sure your tables are closed so that data can be backed up and that no users have access to the database during backup hours. The down side with this approach is that users are unproductive during the backup process.
The other alternative is to create a duplicate set of data while users are using the system. The backup software is then instructed to back up the duplicate data, and not the original. A temporary work area is preassigned for strictly backup and restore procedures. Data is copied into this area using FoxPro so normal activities on the tables can continue. Once the copy is complete, a command is sent to the tape backup software to begin the process. This is best accomplished if you can communicate with your tape backup software directly.
The restore procedure would work in a similar fashion. Data would be restored to the temporary directory from the backup. The FoxPro application would then look at the data in the work directory and decide on a record by record basis whether to add this information back into the active database.
Hardware
There are many decisions that should be made to ensure the best performance, fault tolerance, cost effectiveness and future compatibility of your application. While an in depth discussion about this is beyond the scope of this paper, there are hardware considerations that directly affect the application developer.
More storage options exist than ever before. Optical drives are being used for archival purposes and even backup. IDE hard drives are fast and very inexpensive, and the new enhanced specification permits maximum drive sizes of eight gigabytes. SCSI drive technology is also on the move. Seagate now manufactures a nine gigabyte hard drive with storage costs of approximately $0.50/megabyte. Theres every reason to believe that newer technologies will bring more cheap storage to the end user.
There will be many times when hardware becomes the limiting factor in your application. Networks are inherently complex and there are many possible sources for error, many less obvious than others. Lets take a look at the data path after the enter key is pressed. Of course Im assuming you entered the correct data in the first place! The data path could include system memory cache, data bus, hard drive controller, hard drive, network interface card, network software drivers and protocol layers, physical transport through the cable to a concentrator, network repeaters, file server interface card, file server cache, file server controller cache, and finally the file server hard drive. In fact there are so many paths that a byte has to travel its miraculous that it gets there at all.
Raid
Raid devices are becoming more prevalent in large systems as prices have declined and the technology has been accepted by a large number of vendors. RAID is an acronym for Redundant Array of Inexpensive Disks, a term which aptly describes the purpose of the technology. A Raid system is a collection of hard drives which combined in one of several configurations can provide data redundancy which is inexpensive yet effective. If one device fails, then data can be recovered from one or all of the other disks on the system. A side benefit of increased performance may be possible, depending on the RAID configuration. The term "levels" is often used to describe the different implementations of RAID. This can be confusing in that higher "levels" do not necessarily incorporate features of lower levels. That said I will use the term "level" when discussing each implementation of RAID since it is common practice to describe RAID in that manner.
Level 1: Mirrored drives
Information is stored on pairs of identical hard drives. If one drive fails data can still be retrieved from the other drive. This is the simplest RAID implementation yet it is very effective and is often used within a Novell environment. Writes are slow in this configuration since both drives must write information before a write task is completed.
Read access may be improved since two drives are used to access information. In a multitasking operating system different read requests could be sent to each drive, satisfying twice as many read requests in a given amount of time. If the same read request is given to each drive, data can be retrieved from the drive that accesses the information first, effectively reducing the average seek drive for just one of the drives.
This is the most expensive RAID solution as two drives are always required to store one drive of data. It is also supported by many controller manufacturers and network operating systems since it is simple.
Level 2: Hamming Code used for error detection
This method is based on work done by R. W. Hamming who showed that if an error was likely to affect one bit in a group, then the amount of overhead required to detect it could be minimal. Additional drives are required for error correction. For example, a group of 8 data drives requires 3 ECC drives. Each bit of data is interleaved across a group of disks with additional disks used for check disks. Writes are very slow and require all drives to respond. However, reads can be quick since all drives can respond to the read request simultaneously. That said, level one outperforms this level on almost all operations except large block transfers. This system is actually too robust and there is a performance penalty for all the extra check disks.
Level 3: Interleaved data
This implementation utilizes the fact that we can detect errors on any drive so only one drive is used to maintain redundant data necessary to correct the error. Blocks of data are striped across all of the data drives with byte 0 of a block on drive 0, byte 1 of a block on drive 1 and so on. A single drive is used as a check disk. Should a failure occur on one of the drives, the data can reconstructed from all of the other drives using an exclusive or of the remaining drives.
Every drive is involved in a read or write transaction. Transfers of large data blocks are handled very well. Since the data is striped across all drives, they can act in parallel achieving excellent throughput for large reads. This solution lends itself to applications such as imaging or video. A disadvantage of this system is the large sector sizes introduced by adding additional disks. Large amounts of data may have to be read in order to fulfill small read requests. Smaller block sizes result in poor performance when using this system.
Level 4: Sectors striped across drives.
In this implementation, sectors are written across drives in sequential manner, block 0 on drive 0, block 1 on drive 1 and so on. Data is interleaved at a sector level rather than bit level. Only one disk is used for error correction. A read involves only one drive and performance is identical to a system with a single drive. Since one drive is involved with a read, a multi-tasking operating system could issue multiple read requests for data on other drives. A write operation requires two drives, the data drive and the ECC drive. You may expect the other drives to be involved however the ECC information can be read from those drives to compute the new ECC. This means that only one write transaction can be performed at any one time.
The system administrator must understand how and when information will be used. In order for this system to be effective, data must be grouped into distinct subsets. This is impossible to achieve when large databases are involved. This system is ideal where the ratio of reads to writes is very high, but the benefits are obtained only in multitasking operating systems.
Level 5: Data and error correction striped across all drives
The benefits of this implementation are that all drives are involved in write operations. As in level four, an entire block is written to a single drive but the ECC data is distributed across all drives. Should any drive fail, data could be retrieved by reading the other drives. This system has the advantage of performing parallel reads and writes.
Applications with
Extremely Large Databases
(c)1994 Val Matison