Threads are a way of distributing the processing load over several calculation units. Talend offers the possibility of launching multi-threaded sub-jobs, in order to parallelize the executions. Here is a proposal for setting up such a treatment.
You must first isolate the processing to be launched in thread, so that it can be executed independently. Create a job and take particular care to properly manage competition between future threads, by not writing to shared resources (files, same rows of a database table, etc.).
If connecting to a database, the connection component must not use a shared connection (uncheck the box "Use or save a shared connection to a database).
It can be interesting that the launched thread knows which thread it represents. Also, you can add two parameters in the context indicating the thread number and the number of threads:
Thus, the logs will be able to display the information of the current thread like this:
To keep in mind that the processing will be launched in multi-thread, name the job with an explicit name ("threadImport" for example).
Then create a second job. Place a tRunJob calling the first created job on the project and repeat this as many times as you want to create threads. In the example below, 3 threads will be spawned at runtime:
On the tRunJob parameters, you can assign a value to the two variables created previously (num_thread and nb_threads):
In the parent Job settings, go to Extra and check Multi-threaded execution. This check box specifies that all the sub-jobs contained in the process will be launched in multi-threaded mode, and not consecutively.
Save your job and name it with an explicit name, indicating that it performs multi-threading (like "multithreadImport").
Then test the processing, you should have a display close to this, in a variable execution order:
-- Starting thread 1/3 --
-- Starting thread 2/3 --
-- Starting thread 3/3 --
It is possible to share the processing of a list between the different threads created:
RSS | Informations |