Encompassing the Whole World Front Page /// Expert Systems, Relational Databases and Fuzzy Logic /// Site Index

Databases

Fundamentally, Data Stores take structured data and preserve it across program invocations for later retrieval. Traditionally this was done using 'flat files', which were organised for the convenience of the program.

Flat files are one of the simplest examples of data storage, usually used when the data structure is fairly simple, and generally sorted against a single 'Key' attribute, generally with one record per line.

Most people will have encountered such files as ".CSV - Comma Seperated Value" files, and simple "XML" files, and they are so simple to use that they continue to be used for such simple ideas as a dictionary list, or for a timeline.

Unfortunately, flat files start to lose their attraction as soon as the data structure gets too big or complex, or when programs want to access the same data using a different 'Key' attribute, or when programs use overlapping and thus incomplete sets of data.

Both Databases and Flat Files have the following properties in common:

  1. Data represents Information about Objects
  2. Relationships between Objects are represented by storing the same values in multiple records
  3. Applications may share data

Flat files have a couple of additional properties, which make them less than ideal:

  1. Programs are central, and the data is stored for the convenience of the program
  2. Access Routines have to have the data structure of the files hard-wired into them, and appear in multiple programs

This causes the following problems:

  1. Data is structured and collected for current usage. Unfortunately, in any sucessful system usage changes, but you than have extra cost to cope with these changes.
  2. Data is spread out over a number of files, accessed by multiple programs, possibly with different representations for the data. This makes data management hard.
  3. The system is backward looking, organised for how things were done, and hard or costly to change. This makes the system a bit flaky and unstable over time.

Because of these problems, databases were developed, but in an ad-hoc fashion. Eventually two main types emerged, the Hierarchical database, and the Network Database, both built around interlinked files which were optimised for fast retrieval. The advantages of these, despite their ad-hoc nature lead to a general move to database technologies and away from the previous file-oriented approach.

Because they were ad-hoc, no one really knew which optimisations were best, leading Edgar F Codd of IBM to produce a series of papers between 1970 and 1974, which laid out a theoretical basis for thinking about databases which has been at the center of research ever since.

This Relational Database model had to introduce a number of new terms, as the existing ones were often just not unammbiguous or exact enough for the level of mathematical rigour required.

Because these terms were unfamiliar, but some of them mapped somewhat to the common terms, it leads a lot of people to misunderstand the model, and then propose alternatives which when analysed look like hierarchical or network databases, complete with the problems which the relational model was developed to deal with.

Relational databases are about the collection of lots of mainly stable facts in a minimally redundant manner which are then used to generate the views of the data that the users need to see.