Multi-Country Address Database Design

Standard

Database design is one of the most critical elements in software development. Realizing that

  1. New programming design patterns and frameworks are introduced from time to time, codes get refactored and system get revamped;
  2. New UI frameworks are introduced from time to time, presentation layer get thrown away and redesigned;
  3. However database design stays the same for as long as possible because business simply do not want to risk valuable production data and development team do not want to deal with legacy codes to accomodate the new design.

A good database design can stand against the agility of business needs. The development team is not expected to frequenty adding or modifying database (for example adding new tables and altering relationship between tables) unless it is “totally necessary”.

While “totally necessary” could be subjective depending on how well a database is designed to handle the changes, a good database design is expected to handle various changes in business requirements without major redesign of the existing database.

Case Study: Handling addresses in multi-country database

I was working for a car classified MNC that have businesses in 3 countries. We were rolling out CRM for Malaysia, then Indonesia and finally Thailand. Each country has a different address format.

Malaysia

[FLOOR] [,][BUILDING]
[STREET_NUMBER,] STREET_TYPE STREET_NAME [HOUSE_NUMBER]
[SECTION] [CITY_DISTRICT|VILLAGE]
POSTAL_CODE LOCALITY
[STATE]
MALAYSIA

Indonesia

[BUILDING]
[SUBBUILDING]
[HOUSE_NUMBER] STREET_NAME
[CITY] [TOWN] [LOCALITY] POSTALCODE
INDONESIA

Thailand

[PLOT] [HOUSE_NUMBER, VILLAGE]
[ROAD]
[SUBDISTRICT, DISTRICT]
[PROVINCE] [POSTAL CODE]
THAILAND

Following Agile principle of Just Enough and Just In Time (and a typical product manager would choose to deal with the other countries in a later time rather than taking the overhead now), development team would design a database schema to cater only for Malaysia user address format.

The following database schema design is efficient to get the job done. Address is the child of Customer. Section, District, Postcode, Locality, Province, and Country are the referencing tables for Address table. Among different location levels, the parent-child relationships have been defined to indicate the hierachy of the location levels. The database is fully normalized and has strong data integrity by defining appropriate foreign keys:

ERD

The development team proceed with loading data to the respective reference tables (Section, District, Locality, Postcode, Province, Country) and launched the CRM for Malaysia users. Soon production data start flooding into Customer table and Address table. Everyone is happy with the system.

2 months later business informed the development team, “Guys, it’s time to let our Indonesia users enjoy our amazing system!”, “Sure!” comes the reply.

When development team looked at Indonesia address format, it is quiet different from Malaysia address format. Leaving AddressLine1, AddressLine2 and CountryId field aside, the remaining fields are different.

Country-Address-Format

The existing database schema cannot fit Indonesia data in any elegant manner due to the parent-child relationship between location level and obviously it is not a good idea to have a Postcode table filled with Malaysia Postcode data, Indonesia Locality data and Thailand Province data (refers to row #3).

Not only Indonesia, even Thailand has a different format compare to Malaysia and Indonesia. The mapping will only get more complicated when more countries join the party.

In order to support Indonesia, the existing database schema need to go through moderate design change to cater for Indonesia address format. Later when system need to support Thailand address, another database schema change would take place.

Instead of continue maintaning a structured table design to map one field in a table to one business entity property, while making changes to support Indonesia (or even better when designing database schema in the first place), a hierarchical relationship could be defined for maintaining the dynamic relationship of address location levels.

Simply put, a hierarchical relationship is self referencing mechanism to dynamically define n-level of parent-child relationship.

The most common usage for hierarchical relationship is defining Employee and Manager relationship. A database could have an Employee table. Within the Employee table there is a ManagerId field and the ManagerId field has a foreign key to the same Employee table Id field. The record could keep referencing within the table itself until hitting a record where his ManagerId is null, which indicate that record is the highest level in the hierarchy. In this case, most likely the record is the CEO record.

To put it into context of Address, take a look at the following database schema design:

hierarchical-relationship

In this design:

  1. Tables are much lesser and relationship is much cleaner
  2. Development team can be assured that this design could handle any address structure in any other countries.

The above database design schema shows:

  1. There is a Customer table.
  2. There is an Address table that could have multiple records for a particular Customer (one to many relationship). The Address table has a foreign key to Location table to tell which is the smallest level of location the Customer belong to.
  3. There is a Location table that has a hierachical relationship with itself. Base on the ParentId field, you can “figure out” the full set of address.

So how do we “figure out” the hierachy?

Let’s take a look at a sample set of data in Location table:

Location

By using the following SQL query with Id = 27:

WITH LocationTree
AS ( SELECT Name, ParentId
FROM Location
WHERE Id = 27
UNION ALL
SELECT Location.Name, Location.ParentId
FROM Location
INNER JOIN LocationTree ON Location.Id = LocationTree.ParentId)
SELECT Name
FROM LocationTree
You would get result as following:
Id-27
This result shows the child parent relationship between the location. Cheras is the parent of 56100 (postcode), Selangor is the parent of Cheras and Malaysia is the parent of Selangor. In address context, Postcode 56100 is inside Cheras, Cheras is inside Selangor and Selangor is inside Malaysia.
The Id = 27 is set in Address table LocationId field to represent the lowest level of location the application could identify for the particular user.
Now if we modify the query to Id = 28, you would get a different set of location hierachy:
Id-28
Not only the hierachical relationship would work for single country, but also other countries with lesser location level. Assuming that we have a customer from Indonesia and the smallest location level we could identify is City, say Id = 22, where the Id = 22 is set in the Address record belongs the Indonesia user. By running the same query, we would get the following:
Id-22
The mechanism for hierachical relationship is simple in the address context. Howeer at the application level, the code need to set the smallest location level into LocationId in Address table. At the application level the input would need to be in the form of selection rather than free text input (else it would defeat the normalized database implementation). For example:
location-selection
Regardless of how many location level we have, or what is the sequence of the location level, a hierachical relationship will allow us to have a normalized yet flexible database schema that is resilient against business changes like supporting more countries.

An application developer that handles the programming codes for multiple countries might complain he could not map the list of address result hierachy into his object property. Imagine he has 4 levels for Malaysia, 3 levels for Indonesia and unknown level for Thailand. How would he design his codes to map what does each level means or represent for each country without having 300 lines of if-else statement? A trick for handling such complexity is through dependancy injection per country. After all, it is the application code job to give context to the data while the database job is to provide storage for the data.

One last thought…

Having a hierachical relationship would increase the flexibility of database schema, but it increases complexity in application codes and would not provide a clear one to one mapping on table field to object property. Such implementation is fundamentally different from how a typical development team would design the Address table. Such complexity is not required if a development team is designing a system to support only 1 country address format. However hierachical relationship should be given consideration if the database needs to handle multiple-country address with various combination of location level relationship.

Leave a Reply

Your email address will not be published. Required fields are marked *