Zeuz Test Deploy Architecture

18 Sep 2023

Do you know how your test cases are processed when you hit the Run button? Let’s dive in to understand the architecture behind the deploy engine that underpins the ZeuZ platform.

Challenges

Roughly 1.5+ year ago, ZeuZ server would simply hang (100% cpu usage) or crash when you deployed 500+ test cases at a time. It would remain unavailable for more than 5 minutes at stretch making it unusable for practical scenarios. Our clients generally have 2-4k+ test cases and deploy about 500+ test cases every day via manual deploy, CI/CD or automated schedulers. This made it impossible to scale out.

To solve this challenge, I set out to write a completely new deploy engine using Go replacing the older one written in Python. The result? We can now deploy thousands of test cases without the cpu usage going over 5-10%. I’ll talk about the intricacies and implementation details of the new engine in a separate post, this post is more of an overview of the current architecture.

Architecture

Users, Scheduler or CI/CD triggers a new test set deployment.
The deploy endpoint handler takes the request and generates a new RunID. Note that, multiple parallel calls to the deploy endpoint is fairly normal.
A RunID is a timestamp based format where the smallest unit to differentiate between two RunIDs is a single “second”. This means, we cannot have two RunIDs generated on the same second! To avoid generating two exact same RunIDs, we use a distributed-mutex lock based on Redis to pause for 1 sec every time a new RunID is generated.
Attach the generated RunIDs to the deploy requests.
Serialize the deploy requests and put them in a Redis queue (a sorted set to be precise).
Deploy Manager - an embedded service that runs an infinite loop to continuously watch for new deploy requests in the queue. As soon as a new deploy request is found, it is passed on to a background worker.
The background worker reads the test cases listed in the deploy request from the main database and passes the data to the Result Keeper.
The Result Keeper takes the test cases as they’re passed in and create historical records in the database. To optimize the database operations, we perform table-to-table copy for a portion of the data.
The test cases are then stored in separate queues identified by the “node id” of deploy requests.

This concludes the architecture of the deployment process. After a ZeuZ node connects to the server, it’ll request to see if test cases are available in the queue identified by its “node id” - I’ll discuss this in a separate post.

Concepts

ZeuZ Server - a test case (+project management) platform to author, manage and run both automated and manual tests.
ZeuZ Node - automation engine (agent) that runs on your desktop, cloud VM, etc
- takes instructions from the server, executes them (open app/browser, go to link, click on buttons, etc) and sends the report of the execution back to the server.
RunID - each deploy request is assigned a unique timestamp based ID to differentiate from other deploy requests. Example: Mon-Sep-18-09:54:53-2023. We have plans to fix the limitation later, but with the scale we deal with, this is more than enough.