Today we begin a series of posts on data normalization. We will talk about functional dependencies, a concept that needs to be explained before we dive deeply into database schema normalization.
The subject is rather abstract and theoretical but I will try to restrain myself from going too deep into mathematics. I will try to keep things simple and down-to-earth. (The operative word being try 😉 )
Simple example: Person Table
The data in a table are usually not independent. The values in one column can determine the values in other columns. We’ll explain this using a simple example.
Example data in the table might be like this.
|SSN||First name||Last name||Date of birth||Address||Phone number|
Here, the value in the column SSN (Social Security Number) determines the values in columns first name, last name, date of birth, address and phone number. This means that if we had two rows with the same value in the SSN column, then values in columns first name, last name, date of birth, address and phone number would be equal. A person with SSN 123-98-1234 is always called Cindy Cry, is born on 15-05-1983, and so on. A situation like this is called functional dependency.
The notation for functional dependency:
ssn → first_name ssn → last_name ssn → date_of_birth ssn → address ssn → phone_number
In short, we might write it like this:
ssn → first name, last name, date_of_birth, address, phone_number
On the left hand side of the arrow we put the name of the column the other is dependent on. On the right hand side we put the name of the column that is dependent.
One thing is worth clarifying before we go on: a functional dependency is a constraint in the database schema, not in the data. You may notice that in our example all people named John are born on 30-01-1980. But it does not mean that we have a functional dependency:
first_name → date of birth
The constraint is not generally true for all data that might come into our table. It’s only a coincidence in our data. Coincidences don’t count. To find a functional dependency you only look for constraints that are generally true, for all possible data.
Another simple example: car table
The obvious functional dependency is this:
vin → brand, model, production_year, price
The column vin determines the brand, the model, the production year and the price columns.
A careful reader might ask: what about a dependency like this:
price → price
Surely the price column determines its own value? You’re right. A dependency like this is called trivialfunctional dependency. You usually omit trivial dependencies if you’re writing up all functional dependencies in a table.
Example: students table
Not all functional dependencies have a single column in the left-hand side. Take a look at the students table.
|Cindy Cry||6||Databases 101||Jack Magpie|
|John Novak||4||Databases 101||Margaret Beettle|
|Allan Smith||6||Algorithms||Paul Reason|
The student determines the semester they’re currently at, so we have a functional dependency.
student → semester
But there is one more functional dependency here. Student and lecture together determine the name of the teaching assistant tutoring the student. The lecture only is not enough: some lectures have more than one teaching assistant. The student only is not enough either: students attend many different lectures. Without the lecture name we don’t know which class we want to find a teaching assistant for. We note the functional dependency like this:
student, lecture → teaching assistant
And you always have plenty of trivial dependencies:
student, semester → student student, semester → semester lecture, semester, teaching assistant → lecture, teaching assistant