Container Redeploy Algorithm
The platform provides a quick and reliable container redeploy flow to update your applications based on the Docker containers to a new version (tag). Below, we’ll overview internal steps of the implementation:
1. A new container based on the redeployment target template is created. System files from this container will be used to update the original node, which is just about 300-600 MB of data to copy.
2. A list of paths to user data and some of the system-related files (iptables configuration, NFS service configuration, etc.) is gathered in the initial container.
3. A snapshot of the initial container is created via the ploop management utility.
Prior to this point, the initial container is unaffected yet. Afterward, the changes are applied to the snapshot, merging changes only in case all the configurations are successful. In other words, the platform creates a return point, allowing to discard changes and return to the initial state upon failure.
4. The initial container is stopped and its file system is mounted.
5. A temporary folder is created in the root of the mounted filesystem. Next, user data (from the second step) is moved under this directory and all other files are erased.
These operations are performed inside of a single filesystem and are relatively fast.
6. System files of the redeploy target template (from the first step) are copied into our mostly empty snapshot with just user data in the temporary folder. Next, container is initiated with the jem docker setup and jem docker aftercreate commands.
At this point, we have a clean working container based on the new Docker tag with user data in temporary directory.
7. Now, user data is restored to the original location, and the container is started via the jelinit process.
If everything is OK, the snapshot is merged, completing the redeployment process.
Redeploy with Backups (Deprecated)
The described redeployment flow was utilized in the PaaS 5.6 - 5.7.6 versions and is deprecated since 5.7.7.
To ensure clients' applications safety during the container redeploy operation, the platform creates the appropriate container backups. Below, we’ll overview the algorithm in details:
1. Ensure there is enough disk space to create a copy of the initial container or, if redeploying node group, for the whole layer.
Upon failure, the initial container is unaffected yet. The appropriate notification (i.e. not enough disk space on the hardware node) is shown for users in the dashboard.
2. A new node with the required image tag is created.
Upon failure, the initial container is unaffected yet. The appropriate error (e.g. registry unavailability, non-supported image OS, etc.) is shown for users in the dashboard.
3. The new node is provided with all the container settings and limits of the initial one.
Upon failure, the initial container is unaffected yet. No user-dedicated errors are provided during this step.
4. The new node is initiated through the docker setup and docker aftercreate procedures. The initial container is stopped.
Upon failure, the initial container (unaffected) is started, while the new one is stopped and, based on the settings, is stored for analysis or removed. The appropriate error (image could not be run) is shown for users in the dashboard with additional info stored in logs.
5. The initial container is switched to the mounted state, allowing to copy user data (volumes) to the new node via the rsync software tool.
Upon failure, the initial container (unaffected) is started, while the new one is stopped and, based on the settings, is stored for analysis or removed. The same error as in the previous step is shown for users in the dashboard, and the extended rsync error description is added to the logs.
6. The new node is tweaked and finalized (i.e. added NFS configuration files, user SSH keys, firewall rules, etc.).
Upon failure, the initial container (unaffected) is started, while the new one is stopped and, based on the settings, is stored for analysis or removed. No user-dedicated errors are provided during this step.
7. Both containers are stopped, and their CTID and UID are swapped. The new node is started.
Upon failure, the IDs are swapped back, the initial container is started and the redeploy error is shown for the user in the dashboard.
Now, the customer sees the redeployment operation success in the dashboard and can work with the new node. Herewith, the initial container is temporarily stored as a backup, allowing, if needed, to restore its pre-redeployment state.
With such a flow the actions with initial container are done only during the last step and all the potentially harmful configurations are performed with a separate node, ensuring the that user container always can be restored.
Restoring Container from Backup (Deprecated)
You need to configure storing of the container backups for the redeploy operation to be able to roll back customer’s container to its pre-redeployment state. By default, backups are stored for a week, and only the latest container update (per account) can be reverted.
If there are some problems with container/application after successful redeploy, the user can contact your support team and request restoring the previously used container version. In order to help your customer, the following administration > cluster private API methods (can be run by cluster admin only) should be used:
- GetBackups (appid, session, nodeId) - returns a list of backups assigned to the specified node ID
- RestoreBackup (appid, session, nodeId, backupId) - substitutes the specified container with the required backup
Substitute the parameters as follows:
- appid - the platform identifier (i.e. cluster)
- session - admin user authentication session or access token with the appropriate permission
- nodeId - container identifier to find backups for (GetBackups) or to restore backup into (RestoreBackup)
- backupId - identifier of the backup, which should be restored
The flow is simple, use the GetBackups method to learn the ID of the necessary backup and use it in the RestoreBackup call to roll back.