Azure WebJob with Azure Queue

Standard

Cron job is essential part of complex systems to execute of certain script or program at a specific time interval. Traditionally, developer or system administrator create Windows Scheduled Task to execute scheduled job within the operating system.

In one project, I used to have multiple .exe programs scheduled to update the production database during mid night for various use cases such as expiring user credit. This gets the job done within application context but this is not the cleanest way when my system administrator need to take care of 20 other cron jobs coming from different machines and different operating systems.

The next thing I have implemented is to expose a cron job through WCF API endpoint. For example, I opened a WCP API endpoint to be triggered on a functionality in sending email notification. This end point will map user’s saved criteria and business inventory on a daily basis. (Yes, this is the annoying 9.00AM email spam notification you get everyday. Sorry!) The WCF API endpoint does not do anything if no one hits it. It is a simple HTTP endpoint waiting for something to tell him to get up and work.

The reason to expose the cron job as WCF API endpoint is to allow my system administrator to have a centralized system to trigger and monitor all the cron jobs in one place rather than logging into multiple servers (operating systems) to monitor and troubleshoot. This works alright except that now I have my cron job stuck in a WCF project instead of simple script or a lightweight .exe program.

Azure WebJob

The next option is Azure WebJob. Azure WebJob enables me to run programs or scripts in web app context as background processes. It runs and scales as part of Azure Web Apps. With Azure WebJob, now I can write my cron job as simple script or a light-weight .exe rather than WCF. With Azure WebJob, my system administrator can also have a centralized interface, Azure Portal to monitor and configure all the cron jobs. In fact, it’s pretty cool that I can trigger a .exe program using a public HTTP URL using the Web Hook property in WebJob.

Azure WebJob goes beyond the traditional cron job definition (timer based). Azure WebJobs can run continuously, on demand or on a schedule.

The following file types are accepted:

  • .cmd, .bat, .exe (using windows cmd)
  • .ps1 (using powershell)
  • .sh (using bash)
  • .php (using php)
  • .py (using python)
  • .js (using node)
  • .jar (using java)

I will use C#.NET to create a few .exe to demonstrate how Azure WebJob works.

Prerequisites (nice to have)

Preferably, you should have Microsoft Azure SDK installed on your machine. At the point of writing this, I’m running Visual Studio 2015 Update 3, so I have the SDK for VS2015 installed using Web Platform Installer.

Note that this is NOT a MUST have. You can still write your WebJob in any above-mentioned language and upload you job manually in Azure Portal. The reason I recommend you to install it is to make your development and deployment much easier.

Working with Visual Studio

If you have Microsoft Azure SDK installed for Visual Studio, you will see Azure WebJob template. We are going to start with a simple Hello World program to get started.

Your WebJob project will be pre-loaded with some sample codes. As usual, you will still need to build it once for NuGet to resolve the packages. 

The following packages are part of your packages.config if you started the project with Azure WebJobs template. No big deal if you didn’t, you can install them manually, although it’s a little tedious.

For now, we will ignore the fancy built-in SDK support for various Azure services. We will create a simple Hello World program and deploy it to Azure to get an end-to-end experience.

Remove everything else and left with Program.cs writing a simple message:

Go to your Azure Portal. Click on Get publish profile to download your publishing profile. You will need it to import into your Visual Studio later when publishing your WebJob.

Import your publish profile into your project in Visual Studio. The Import dialog will kick in at the first time when you publish your project as WebJob.

Right click on your project and select “Publish as Azure WebJob”, you will see the following dialog to set up your publishing profile.

Import the earlier downloaded publishing profile setting file into your WebJob project in Visual Studio.

Validate your connection.

Click on “Publish” to ship your application to Azure.

Upon successful publishing, you should be seeing a message “Web App was published successfully…”

Go to your Azure portal and verify that your WebJob is indeed listed.

Select your WebJob, click on the Logs button on top to see the following page.

The impressive part about using WebJob in Azure is the following WebJobs monitoring page. You can use this page to monitor multiple WebJobs status and drill down deeper into the respective logs. No extra cost, coding or configuration, all work out of the box!

Now we have our first Hello World application running in Azure. We have deployed our WebJob to run continuously, which means it will get triggered automatically every 60 seconds. Once the first round is completed, status will change to PendingRestart and wait for the next 60 seconds to kick in.

WebJob SDK sample project in GitHub demonstrates comprehensively how you can work with WebJob through Azure Queue, Azure Blob, Azure Table, Azure Service Bus. In this article, we will do a little bit more coding by using WebJob to interact with Azure Queue.

Azure WebJob with Azure Queue Storage

Microsoft.Azure.WebJobs namespace provides QueueTriggerAttribute. We will use it to trigger a method in our WebJob.

This works by whenever a new message is added into the queue, the WebJob will be triggered to pick up the message in the queue.

Before we continue in our codes, we first need to create a Azure Storage account to host the queue. Here, I have a storage account name “danielfoo”.

We will use Microsoft Azure Storage Explorer to get visual on our storage account. It’s a handy tool to visualize your data. If you do not have it, no worry, just imagine the queue message in your mind 🙂

Let’s add a new console application project in our solution to put some messages in our queue.

There are two packages that you’ll need to install into your queue project:

We will write the following simple codes to initialize a queue and put a message into the queue.

Of course, you will have to configure your StorageConnectionString in app.config for codes to recognize the connection string.

You can get your account name and key from Azure Portal.

Let’s execute our console application to test if our queue can be created and whether a message can be placed into the queue properly.

After execution, look at Storage Explorer to verify if the message is already in the queue.

Now we will dequeue this message so that it will not interfere with the actual QueueTrigger in our exercise later.

Next, we will create a new WebJob project that get triggered whenever a message is added into the queue by using QueueTriggerAttribute under Microsoft.Azure.WebJobs namespace.

This time we do not remove Functions.cs nor modify Program.cs.

Make sure that your Functions.cs method parameter contains the same queue name as what you defined earlier in your Queue.MessageGenerator project. In this example, we are using the name “danielqueue”.

Program.cs

Remember to fill up your App.config on the following connection string. This is to allow the WebJob to know which storage account to monitor.

Now, let’s start WebJob.QueueTrigger project as a new instance and allow it to wait for a new message add into “danielqueue”.

Then, we will start Queue.MessageGenerator project as a new instance to drop a message into the queue for WebJob.QueueTrigger to pick up.

Yes! Our local debug is has detected a new message is added into “danielqueue” hence hit the ProcessQueueMessage function.

Let’s publish our WebJob.QueueTrigger to Azure to see it processing the queue message in Azure context instead of local machine. After successful publishing, we now have 2 WebJobs.

Select QueueTrigger (the WebJob we just published) and click on Logs button on top. You will see the following log on queue message processing.

If you drill down into particular message, you will be redirected to the Invocation Details page

We have just setup our WebJob to work with Azure Queue!

That wraps up everything I want to show you in working with Azure WebJob and Azure Queue.

Obviously in reality you will write something more complex than simply output the log in your WebJob. You may write some logic to perform certain task. You may even use this to trigger another more complex job sitting in another service.

In the queue, obviously you also wouldn’t write a real “message” like I did. You will probably create one queue for very specific purpose. For example, you will create a queue to store a list of ID, where each of the ID is required for another type of process such as indexing. The queue will index the entity (represented by the ID) in batches (let’s say 4 messages at a time) instead of having a large surge of load in a short period of time.

Few more thoughts…

  1. By default, JobHostConfiguration.QueuesConfiguration.BatchSize handles 16 queue messages concurrently. I recommend you to override the default value with a smaller value (let’s say, 4) to ensure the other end which does the more heavy processing (for example indexing a document in Solr or Azure Search) is able to handle the load. The maximum value for JobHostConfiguration.QueuesConfiguration.BatchSize is 32. If having WebJob to handle 32 message at a go is not sufficient for you, you can further tweak the performance by setting a short JobHostConfiguration.QueuesConfiguration.MaxPollingInterval time to make sure you do not accumulate too many message before the processing kicks in.
  2. If for whatever reason you have max out the built-in configuration (such as BatchSize, MaxPollingInterval) and yet it is not good enough, a quick win will be to scale up your WebApp. Note that you cannot scale your WebJob alone because WebJob sits under the context of WebApp. If scaling up WebApp for the sake WebJob sound like an inefficient way, consider migrating your jobs to Worker Role.
  3. WebJobs are good for lightweight processing. They are good for tasks that only need to be run periodically, scheduled, or triggered. They are cheap and easy to setup and run. Worker Roles are good for more resource intensive workloads or if you need to modify the environment where they are running (for example .NET framework version). Worker Roles are more expensive and slightly more difficult to setup and run, but they offer significantly more power when you need to scale. There is a pretty comprehensive blog post by kloud comparing WebJob and Worker Role.
  4. Azure Storage Queue has no guarantee on message Ordering. In other words, a message get placed into the queue first does not necessary get processed first. Delivery for Azure Queue is At-Least-Once but not At-Most-Once. In other words, a message potentially get processed more than once. The application codes will need to handle the duplication of what happens after a message is picked up. If this troubles you, you should consider Service Bus Queue. The Ordering is First-In-First-Out (FIFO) and delivery is At-Least-Once and At-Most-Once. If you are wondering then why people still use Azure Storage Queue, it is because Storage Queue is designed to handle super large scale queuing. For example, maximum queue size for Storage Queue is 200 TB while Service Bus Queue is 1 GB to 80 GB; maximum number of queues for Storage Queue is Unlimited while for Service Bus Queue is 10,000. For complete comparison reference, please refer to Microsoft doc.

I hope you have enjoyed reading this article. If you find it useful, please share it with your friends who might benefit from this. Cheers!

Tech Talk 2016

Standard

2016 has been a fruitful year for me in software development. Apart from my day job in Sitecore as a lead developer, I also have a lot fun with services in Azure cloud in my spare time.

Another new “adventure” I tried in 2016 is speaking for Tech Talk in local tech communities – from my own office, university, Microsoft office to being an online panelist in Google Hangout discussion.

Microsoft Malaysia Level 26, Tower 3

Scaling SQL Server @ Microsoft Malaysia Level 26, Tower 3

multimedia-university

Software Industry Career Advice @ Multimedia University, Computing Faculty Lecture Hall, Cyberjaya

Sitecore Malaysia - Daniel Foo on Azure Search

Working with Azure Search @ Sitecore Malaysia, Level 18

Standardizing DevOps Across Organization @ Continuous Discussions (#c9d9) Google Hangout

It has been a rewarding experience to contribute and making a difference in tech communities through Tech Talk. It amazed me when some audiences asked me follow up questions based on what I have shared earlier. It is truly satisfying to learn that I have inspired some audiences with useful information or ideas in general which they can benefit from.

It has been an honor to share the stages with many of the knowledgeable speakers. As much as I enjoyed sharing, I have learned equally a lot from them. Thanks to those who invited me over to speak, those who helped me out to make the Tech Talks possible and those who came and supported me. Thank you! I am truly lucky to have you all amazing people as friends in the tech communities.

Signing off for 2016… It has been a fantastic year, looking forward to 2017!

Continuous Integration and Continuous Delivery with NuGet

Standard

Continuous Integration (CI) is a development practice that requires developers to integrate codes into a shared repository. Each commit will then be verified by an automated build and sometimes with automated tests.

Why Continuous Integration is important? If you have been programming in a team, you probably encountered situation where one developer committed codes that cause every developer’s code base to break. It could be extremely painful to isolate the codes that broke the code base. Continuous Integration serves as a preventive measurement by building the latest code base to verify whether there is any breaking changes. If there is, raise an alert perhaps by sending out an email to the developer who last committed the codes or perhaps notify the whole development team or even to reject the commit. If there isn’t any breaking change, CI will proceed to run a set of unit test to ensure the last commit has not modify any logic in an unexpected manner. This process sometimes also known as Gated CI, which guarantees the sanity of the code base in a relatively short period of time (usually within few minutes).

CI

The idea of Continuous Integration goes beyond validating the code base in a team of developers working on. If the code base utilizes other development teams’ components, it is also about continuously pulling the latest components to build against the current code base. If the code base utilizes other micro-services, then it is about continuously connecting to the latest version of the micro-services. On the other hand, if the code base output is being utilized by other development teams, it is also about continuously delivering the output so that other development teams can pull the latest to integrate with. If the code base output is a micro-service, then it is about continuously exposing the latest micro-service so that other micro-services can connect and integrate to the latest version. The process of delivering the output for other teams to utilize leads us to another concept known as Continuous Delivery.

Continuous Delivery (CD) is a development practice where development team build software in a manner the latest version of software can be released to production at any time. The delivery could mean the software being delivered to a staging or pre-production server or simply a private development NuGet feed.

Why Continuous Delivery is important? In today software development fast pace of change, stakeholders and customers wanted all the features yesterday. Product Managers do not want to wait 1 week for the team to “get ready” to release. Business expectation is as soon as the codes are written and functionalities are tested, software should be READY to ship. Development teams must establish an efficient delivery process where delivering software is as simple as pushing a button. A good benchmark is the delivery can be accomplished by anyone in the team. Perhaps to be done by a QA after he has verified the quality of the deliverable or by Product Manager when he thinks the time is right. In complex enterprise system, it is not always possible to ship codes to production quickly. Therefore complex enterprise system is often broken into smaller components or micro-services. In this case, the components or micro-services must be ready to be pushed to a shared platform so that other components or micro-services can consume the deliverable as soon as available. This delivery process must be at READY state at all time. The decision of whether to deliver the whole system or the smaller component should be a matter of business decision.

Note that Continuous Delivery does not necessary mean Continuous Deployment. Continuous Deployment is where every change goes through the pipeline and automatically gets pushed into production. This could lead to several production deployments every day, which is not always desirable. Continuous Delivery allows development team to do frequent deployments but may choose not to do it. In today’s standard for .NET development, NuGet package is commonly used for either delivering a component or a whole application.

NuGet is the package manager for the Microsoft development platform. A NuGet package is a set of well-managed library and the relevant files. NuGet packages can be installed and be added to .NET solution from GUI or command line. Instead of referencing to individual library in the form of .dll, developers can reference to a NuGet package which provides much better management in handling dependencies and assemblies versions. In a more holistic view, a NuGet package can even be an application deliverable by itself.

Real life use cases

Example 1: Micro-services

In a cloud based (software as a service) solution, domains are encapsulated in the respective micro-service. Every development team is responsible for their own micro-services.

microservices

Throughout the Sprint, developers commit codes into TFS. After every commit, TFS will build the latest code base. Once the building process is completed, unit tests will be executed to ensure existing logic are still intact. Several NuGet packages are then generated to represent several micro-services (WCF, Web application, etc). These services will be deployed by a deployment tool known as Octopus Deploy to a Staging environment (hosted in AWS EC2) for QA to perform testing. This process continues until the last User Story is completed by the developers.

In a matter of clicks, the earlier NuGet package can also be deployed to Pre-production environment (hosted in AWS EC2) for other types of testing. Lastly, with the blessing from Product Manager, DevOps team will use the same deployment tool to Promote the same NuGet packages that were tested by QA earlier into Production. Throughout this process, it is very important that there is no manual intervention (such as copying a dll, changing a configuration, etc) by hands to ensure the integrity of the NuGet package and deployment process. The entire delivery process must be pre-configured or pre-scripted to ensure the process is consistent, replicatable, and robust.

Example 2: Components

In a complex enterprise application, functionalities are split into components. Each component is a set of binary (dll) and other relevant files. A component is not a stand-alone application. The component has no practical usage until it sits on the larger platform. Development teams are responsible for their respective component.

Throughout the Sprint, developers commit codes into a Git repository. The repository is monitored by Team City (build server). Team City will pull the latest changes and execute a set of Powershell script. From the Powershell script, an instance of the platform is setup. The latest code base will be built and the output is placed on top of the platform. Various tests are executed on the platform to ensure the component functionality is intact. Then, a set of NuGet package will be generated from the Powershell script to be published as the artifacts. These artifacts will be used by QA to run other forms of tests. This process continues until the last User Story is completed by the developers.

When QA gives the green light and with the blessing from Product Manager, the NuGet packages will be promoted to ProGet (an internal NuGet feeds). This promotion process happens in a matter of clicks. No manual intervention (modifying the dependencies, version, etc) should happen to ensure the integrity of the NuGet package.

Once the NuGet package is promoted / pushed into ProGet, other components update this latest component into their components. In Scaled Agile, a release train is planned on frequent and consistent time frame. Internal release happens on weekly basis. This weekly build will always pull all of the latest components from ProGet to generate a platform installer.

Summary

From the examples, we can tell that Continuous Integration and Continuous Delivery are a fairly simple concepts. There is neither black magic nor rocket science in both the use cases. The choice of tools and approaches to accomplish largely depend on the nature of the software we are building. While designing software, it is always a good idea to keep Continuous Integration and Continuous Delivery in mind to maximize team productivity and to have quick and robust delivery.

Json Data in SQL Server

Standard

The rise of NoSQL database such as mongoDb is largely due to the agility to store data in non-structured format. A fixed schema is not required like traditional relational databases such as SQL Server.

However, NoSQL database such as mongoDb is not a full-fledged database system. It is designed for very specific use cases. If you don’t know why you need to use NoSQL in your system, chances are you don’t need to. For those who find it essential to use a NoSQL database, often they only use NoSQL database for certain portion of their system and then use another RDBMS for the remaining part of their system that have more traditional business use cases.

Wouldn’t it be nice if RDBMS is able to support similar data structure – having the ability to store flexible data format without altering database tables?

Yes, it is possible. For years, software developers have been storing various JSON data in one table column. Then, developers will make use of library such as Newtonsoft.Json within the application (data access layer) to deserialize the data to make sense out of the JSON data.

Reading / Deserializing JSON

This works. However “JsonConvert.DeserializeObject” method is working extremely hard to deserialize the whole JSON data to only retrieve a simple field such as Name.

Imagine there is a requirement for searching certain Genres on a table that has 1 million row of records, the application codes will have to read 1 million row of records, then perform filtering on the application side. Bad for performance. Now imagine if you have a more complex data structure than the example above…

The searching mechanism will be much efficient if developers can pass a query (SQL statement) for database to handle the filtering. Unfortunately SQL Server does not support querying JSON data out of the box.

It is impossible to directly query JSON data in SQL Server until the introduction of a library known as JSON SelectJSON Select allows you to write SQL statement to query JSON data directly from SQL Server.

How JSON Select Works

First you need to download an installer from their website. When you run the installer, you need to specify the database you wish to install this library at:

JsonSelect

What this installer essentially does is to create 10 functions in the database you have targeted. You can see the functions at:

SSMS > Databases > [YourTargetedDatabase] > Programmability > Functions > Scalar-valued Functions

functions

Next, you can start pumping in some JSON data in your table to test it out.

I create a Student table with the following structure for my experiment:

table

In my StudentData column, I enter multiple rows of records in the following structure:

For demonstrating the query purpose, I have entered multiple rows as following:

all-rows

If you want to write a simple statement to read the list of student names in JSON data, you can simply write:

You will get result as following in SSMS:

name

How about more complex query? Does it work with Aggregate Functions?

If you want to find out about how many students come from each city and what is their average age, you can write your SQL Statement as following:

You will get result as following in SSMS:

complex-query

It appears the library allows you to query any JSON data in your table column using normal T-SQL syntax. The only difference is you need to make use of the predefined scalar-valued functions to wrap around the values you want to retrieve.

Few Last Thoughts…

  1. The good about this library is it allows developers to have hybrid version of storage (NoSQL & relational database) under one roof – minus the deserialize code at application layer. Developer can continue using the classical RDBMS for typical business use cases and leverage on the functions provided in the library to deal with JSON data.
  2. The bad about this library is it lacks proven track record and commercial use cases to demonstrate the robustness and stability.
  3. Although the library is not free, the license cost is relatively affordable at $AU 50. However the library is free for evaluation.
  4. SQL Server 2016 provides native support for JSON data. This library is only useful for SQL Server 2005 to 2014 where upgrading to 2016 is not a feasible option.