If you’ve used Camunda BPM you might have noticed our engines briefest feature - it’s really really fast. Might be an idea to have a look at this whitepaper or read about our scalabilty if you haven’t experienced it for yourself. With the new job priorization feature and the exponential backoff included in the new Camunda BPM 7.4.0- release you will now be able to improve the speed further by tuning the efficiency of your custom jobs among others.
Imagine you have a lot of jobs (maybe 50,000) on a cluster environment which wait to be executed all at the same time. The current behavior of the engine (Camunda BPM 7.3.0) is to acquire all these jobs in parallel and without any order. So it is possible that the engine executes a rather unimportant “historization” job first while a job that is mission critical for your business is left unprioritized.
To make execution more efficient for your specific environment and to bring order into this situation of high job executor load, Camunda 7.4.0 now enables you to set a priority for each job. This can be done very easy within the Camunda Cockpit.
Furthermore with the new release it’s also possible to set job priorities dynamically for each process instance via process variables.
At runtime the Camunda engine then evaluates the priority of each job like shown in this BPMN-diagram:
But what if you have a large number of jobs with the same priority and same due date?
As already shown in Thorben’s blogpost the current way of executing multiple jobs in parallel is not be very efficient. Imagine you have 4 engines in the cluster and 50,000 jobs to be executed exactly at midnight. All the engines start acquiring jobs exactly at midnight, but only one of them locks the first 50 jobs exclusively and starts executing them. The three remaining engines can’t get any job, so 3⁄4 of an acquisition cycle is wasted without executing jobs.
To handle this inefficient behavior, the new Camunda BPM 7.4.0-release provides a completely new approach, the exponential backoff. The engines in your cluster tries no longer to acquire jobs in parallel, but now they try it sequentially with a short delay (maybe 50 to 150 ms, the backoff). In consequence each node will only pick jobs that have no exclusive locks of other engines. So from now on, the acquiring jobs have only minimal overlap (if any) and all the jobs can be executed as quickly as possible.