Today we begin a series of posts on data normalization. We will talk about functional dependencies, a concept that needs to be explained before we dive deeply into database schema normalization.

The subject is rather abstract and theoretical but I will try to restrain myself from going too deep into mathematics. I will try to keep things simple and down-to-earth. (The operative word being try 😉 )

Simple example: Person Table

The data in a table are usually not independent. The values in one column can determine the values in other columns. We’ll explain this using a simple example.

 

Example data in the table might be like this.

SSN First name Last name Date of birth Address Phone number
123-98-1234 Cindy Cry 15-05-1983 Los Angeles 123-456-7891
121-45-6145 John O’Neill 30-01-1980 Paris 568-974-2562
658-78-2369 John Lannoy 30-01-1980 Dallas 963-258-7413

 

Here, the value in the column SSN (Social Security Number) determines the values in columns first name, last name, date of birth, address and phone number. This means that if we had two rows with the same value in the SSN column, then values in columns first name, last name, date of birth, address and phone number would be equal. A person with SSN 123-98-1234 is always called Cindy Cry, is born on 15-05-1983, and so on. A situation like this is called functional dependency.

The notation for functional dependency:

ssn → first_name
ssn → last_name
ssn → date_of_birth
ssn → address
ssn → phone_number

In short, we might write it like this:

ssn → first name, last name, date_of_birth, address, phone_number

On the left hand side of the arrow we put the name of the column the other is dependent on. On the right hand side we put the name of the column that is dependent.

One thing is worth clarifying before we go on: a functional dependency is a constraint in the database schema, not in the data. You may notice that in our example all people named John are born on 30-01-1980. But it does not mean that we have a functional dependency:

first_name → date of birth

The constraint is not generally true for all data that might come into our table. It’s only a coincidence in our data. Coincidences don’t count. To find a functional dependency you only look for constraints that are generally true, for all possible data.

Another simple example: car table

The obvious functional dependency is this:

vin → brand, model, production_year, price

The column vin determines the brand, the model, the production year and the price columns.

A careful reader might ask: what about a dependency like this:

price → price

Surely the price column determines its own value? You’re right. A dependency like this is called trivialfunctional dependency. You usually omit trivial dependencies if you’re writing up all functional dependencies in a table.

Example: students table

Not all functional dependencies have a single column in the left-hand side. Take a look at the students table.

 

Student Semester Lecture Teaching assistant
Cindy Cry 6 Databases 101 Jack Magpie
John Novak 4 Databases 101 Margaret Beettle
Allan Smith 6 Algorithms Paul Reason

 

The student determines the semester they’re currently at, so we have a functional dependency.

student → semester

But there is one more functional dependency here. Student and lecture together determine the name of the teaching assistant tutoring the student. The lecture only is not enough: some lectures have more than one teaching assistant. The student only is not enough either: students attend many different lectures. Without the lecture name we don’t know which class we want to find a teaching assistant for. We note the functional dependency like this:

student, lecture → teaching assistant

And you always have plenty of trivial dependencies:

student, semester → student
student, semester → semester
lecture, semester, teaching assistant → lecture, teaching assistant

Comments

Functional dependencies

Leave a Reply

Facebook Page

Facebook Page Loading...