Database design is one of the most critical elements in software development. Realizing that
- New programming design patterns and frameworks are introduced from time to time, codes get refactored and system get revamped;
- New UI frameworks are introduced from time to time, presentation layer get thrown away and redesigned;
- However database design stays the same for as long as possible because business simply do not want to risk valuable production data and development team do not want to deal with legacy codes to accomodate the new design.
A good database design can stand against the agility of business needs. The development team is not expected to frequenty adding or modifying database (for example adding new tables and altering relationship between tables) unless it is “totally necessary”.
While “totally necessary” could be subjective depending on how well a database is designed to handle the changes, a good database design is expected to handle various changes in business requirements without major redesign of the existing database.
Case Study: Handling addresses in multi-country database
I was working for a car classified MNC that have businesses in 3 countries. We were rolling out CRM for Malaysia, then Indonesia and finally Thailand. Each country has a different address format.
[STREET_NUMBER,] STREET_TYPE STREET_NAME [HOUSE_NUMBER]
[CITY] [TOWN] [LOCALITY] POSTALCODE
[PLOT] [HOUSE_NUMBER, VILLAGE]
[PROVINCE] [POSTAL CODE]
Following Agile principle of Just Enough and Just In Time (and a typical product manager would choose to deal with the other countries in a later time rather than taking the overhead now), development team would design a database schema to cater only for Malaysia user address format.
The following database schema design is efficient to get the job done. Address is the child of Customer. Section, District, Postcode, Locality, Province, and Country are the referencing tables for Address table. Among different location levels, the parent-child relationships have been defined to indicate the hierachy of the location levels. The database is fully normalized and has strong data integrity by defining appropriate foreign keys:
The development team proceed with loading data to the respective reference tables (Section, District, Locality, Postcode, Province, Country) and launched the CRM for Malaysia users. Soon production data start flooding into Customer table and Address table. Everyone is happy with the system.
2 months later business informed the development team, “Guys, it’s time to let our Indonesia users enjoy our amazing system!”, “Sure!” comes the reply.
When development team looked at Indonesia address format, it is quiet different from Malaysia address format. Leaving AddressLine1, AddressLine2 and CountryId field aside, the remaining fields are different.
The existing database schema cannot fit Indonesia data in any elegant manner due to the parent-child relationship between location level and obviously it is not a good idea to have a Postcode table filled with Malaysia Postcode data, Indonesia Locality data and Thailand Province data (refers to row #3).
Not only Indonesia, even Thailand has a different format compare to Malaysia and Indonesia. The mapping will only get more complicated when more countries join the party.
In order to support Indonesia, the existing database schema need to go through moderate design change to cater for Indonesia address format. Later when system need to support Thailand address, another database schema change would take place.
Instead of continue maintaning a structured table design to map one field in a table to one business entity property, while making changes to support Indonesia (or even better when designing database schema in the first place), a hierarchical relationship could be defined for maintaining the dynamic relationship of address location levels.
Simply put, a hierarchical relationship is self referencing mechanism to dynamically define n-level of parent-child relationship.
The most common usage for hierarchical relationship is defining Employee and Manager relationship. A database could have an Employee table. Within the Employee table there is a ManagerId field and the ManagerId field has a foreign key to the same Employee table Id field. The record could keep referencing within the table itself until hitting a record where his ManagerId is null, which indicate that record is the highest level in the hierarchy. In this case, most likely the record is the CEO record.
To put it into context of Address, take a look at the following database schema design:
In this design:
- Tables are much lesser and relationship is much cleaner
- Development team can be assured that this design could handle any address structure in any other countries.
The above database design schema shows:
- There is a Customer table.
- There is an Address table that could have multiple records for a particular Customer (one to many relationship). The Address table has a foreign key to Location table to tell which is the smallest level of location the Customer belong to.
- There is a Location table that has a hierachical relationship with itself. Base on the ParentId field, you can “figure out” the full set of address.
So how do we “figure out” the hierachy?
Let’s take a look at a sample set of data in Location table:
By using the following SQL query with Id = 27:
AS ( SELECT Name, ParentId
WHERE Id = 27
SELECT Location.Name, Location.ParentId
INNER JOIN LocationTree ON Location.Id = LocationTree.ParentId)
An application developer that handles the programming codes for multiple countries might complain he could not map the list of address result hierachy into his object property. Imagine he has 4 levels for Malaysia, 3 levels for Indonesia and unknown level for Thailand. How would he design his codes to map what does each level means or represent for each country without having 300 lines of if-else statement? A trick for handling such complexity is through dependancy injection per country. After all, it is the application code job to give context to the data while the database job is to provide storage for the data.
One last thought…
Having a hierachical relationship would increase the flexibility of database schema, but it increases complexity in application codes and would not provide a clear one to one mapping on table field to object property. Such implementation is fundamentally different from how a typical development team would design the Address table. Such complexity is not required if a development team is designing a system to support only 1 country address format. However hierachical relationship should be given consideration if the database needs to handle multiple-country address with various combination of location level relationship.