April 25, 2006
Database Normalization and Joins
First session of Tuesday morning at the MySQL Users Conference (2006) is Mike Hillyer and Arjen Lenz presenting An Introduction to Database Normalization and Joins.
Not a new subject for me, but I've heard good things about this presentation (has been given at a number of conferences) so want to see what it's about.
What is the overall idea with database normalization?
- cure "spreadsheet syndrome"
- store minimal info
- remove redundancies
- remove anomalies (accidental differences because data is entered multiple times)
- restructure data
Presentation will cover the first three forms of normalization. Mike shows an example of a book table with lots of data piled up into single column rows.
First Normal Form
- all values atomic - every column contains the least amount of information possible - must look at your needs - somtimes firstname lastname is OK in the same column but most often it needs to be broken out
- each row must be unique
Solving it means breaking data into separate tables.
Forming RelationshipsThere are three forms of relationships:
- one to one - expect none or one row in the other table (ie blob data sitting in a separate table)
- one to many - expext one or many
- many to many (joining table) - contains two identifiers, one for each of the represented joined tables
Second Normal FormData must be in first normal form first. Second normal is about composite keys. All columns in a row must refer to the entire primary key.
Third Normal FormAll columns must depend directly on the primary key. "The key, only the key, and nothing but the key, so help me Codd." An example is having a an author table with address, city and zip. The zip isn't necessarily part of the user, it belongs to a city. Mike points out that to get this form of normalization with the zip code example requires many tables, often more than is really required.
(Arjen takes over for the second part of the talk on JOINs)
A JOIN is really about sets (Venn diagrams). Arjen shows a few diagrams with geometric shapes surrounding groups of numers to demonstrate the data that will be returned in the resultset.
Inner gets the sets of common data between two tables. Outer gets the common values plus the extras in the right or left depending on what you specify.
Presentation will be on Mike's site.
Posted by mike at April 25, 2006 1:55 PM