Computer science thoughts

Tips on Talend

Some techniques and code snippets that can help on Talend projects.

Show globalMap content

The globalMap variable is an array used to store the data of the current processing in memory. Here's how to display its content:

java.util.Iterator<String> it = globalMap.keySet().iterator();
    String key = it.next().toString();
    String value = globalMap.get(key)==null ? "null" : globalMap.get(key).toString();
    System.out.println(" - " + key + " = " + value);

To display these same values but sorted in ascending order of properties:

java.util.Iterator<String> it = globalMap.keySet().iterator();
java.util.List<String> liste = new java.util.ArrayList();
    String key =it.next().toString();
    liste.add(key + " = " + (globalMap.get(key)==null ? "<null>" : globalMap.get(key).toString()));

for (int i=0; i<liste.size(); i++)
 log.info(" - {}", liste.get(i));

Add an incremented number on each line of a stream

  • Add a TMap on the job
  • Place this code on the variable containing the auto number: Numeric.sequence("order",1,1)

Keep only last line on duplicate lines

tUniqRow keeps only the first row of duplicate rows. To do the reverse, reverse the order of the rows.

  • Add a field containing an incremented number on each line (previous tip)
  • Perform a reverse sort on the added field (tSortRow)
  • Add a tUniqRow

Increment a counter and retrieve its value

// Counter initialization
globalMap.put("NbLignes", 0);
// Increase value
globalMap.put("NbLignes", ((Integer)globalMap.get("NbLignes"))+1);
// Get value
Integer compteur=((Integer)globalMap.get("NbLignes"));

Retrieve specific data from a sub-job

  • In sub-job :
    • Create a context variable : globaMapParent (type Object)
    • Add a tJavaRow with following code:
      // Affect value
      if (context.globaMapParent instanceof java.util.Map)
      ((java.util.Map) context.globaMapParent).put("id_classe", input_row.id);
  • In main job:
    • Add a tJava with this code:
      // Create a global map to send data to sub job
      globalMap.put("globalMapEnfant", new java.util.concurrent.ConcurrentHashMap());
    • On tRunJob, affect parameters :
      globalMapParent = globalMap.get("globalMapEnfant");

Apply a Run If condition on each line of a flow

If you use the Run If condition on each Row of a stream, only the last row of the stream will be processed in the condition. The treatment will be split into two parts: the first part before the Run If and the second after.

To be able to apply a condition on each line, you must use the tFlowToIterate component, connected to a startup component (a tJava for example). Then place between the two the Run If relation. The diagram below shows the situation:

The complete stream will thus be processed from start to finish, applying the Run If on each line.

To retrieve the data read, check the "Use by default: (key, value) as global variable" box on tFlowToIterate. The data will then be accessible according to the naming of the flow (in the example above, with globalMap.get("row1.XXX"), where XXX will be the name of the variable.

Run a SQL query

The tDBRow component is used to execute certain specific queries. If needed, here is a code that retrieves the previously established connection and also executes a query:

java.sql.Statement statement = ((java.sql.Connection) globalMap.get("conn_tDBConnection_1")).createStatement();statement.execute("ALTER SEQUENCE ma_table RESTART WITH 1;");statement.close();

Sort rows by date with NULL at beginning or end of list

The TSortRow component allows to sort the stream according to a specific field. However, in some cases, if you want to have NULLs at the end of the list or at the start of the list, the component does not offer a direct solution. Here's how to handle it for a date field:

  • When sorting by decreasing dates, to have the NULL at the end of the list, specify "Sort Date"
  • When sorting by decreasing dates, to have the NULL at the start of the list, specify "SortAlpha"
  • When sorting by increasing dates, to have the NULL at the end of the list, specify "Sort Alpha"
  • When sorting by increasing dates, to have the NULL at the start of the list, specify "Sort Date"

Show the number of rows being processed in logs

Here is a technique to display in the logs or the console the number of lines currently processed, in order to give feedback during a long processing. In case of reading a query returning a lot of results, it is better to use a cursor (on a tDBInput, check 'Use a cursor).

  • Create the following 4 context variables:
    Nom Type Commentaire
    num_total_rows LONG Number of total rows
    num_processed_rows INT Number of rows processed in the current sequence
    num_rows_per_seq INT Number of lines processed between 2 measurements
    date_sequence_start LONG Measurement start date
  • In the prejob or at the start of processing, initialize the values of these variables
  • At the level where you want to display the number of rows, add a tJavaRow, generate the code for copying the fields from the feed, then add the following code:

    if (context.num_processed_rows% context.num_rows_per_seq==0) {

    log.info(" - {} rows, {} lig/sec", context.num_total_rows, Math.round(context.num_processed_rows/ ( (System.currentTimeMillis() - context.date_sequence_start+0.001)/1000)));


The log function uses Log4J2 (it must therefore have been added to the project) and will thus display, every X lines read, the number of total lines and the reading speed of the sequence.

Launch Talend with another JDK than the default one

You may get, during a new installation, a message like "Version 1.8.xxx of the JVM is not suitable for this product. Version: 11 or greater is required":

You can still use this version by downloading Open JDK (version 11 or higher) and launching Talend with the following shortcut (specifying the link to the JDK):

TOS_ESB-win-x86_64.exe -vm "E:\Utilities\OpenJDK11\jre\bin"

Display the parameters for calling a Web service

On Talend ESB, following a tRequest, you can display the call parameters to your web service as follows:

  • Add a tFlowToIterate to keep the call parameters
  • Add a tJava with the code below:
    // Show caller's IP
    // - Get restRequest object
    java.util.Map mapRequest = (java.util.Map)globalMap.get("restRequest");

    // - Get MessageContext object
    org.apache.cxf.jaxrs.ext.MessageContext messageContext = (org.apache.cxf.jaxrs.ext.MessageContext) mapRequest.get("MESSAGE_CONTEXT");

    log.info("- IP : {}", messageContext.getHttpServletRequest().getRemoteAddr());
    log.info("- Host : {}", messageContext.getHttpServletRequest().getRemoteHost());
    log.info("- Headers : {}", mapRequest.get("ALL_HEADER_PARAMS"));
    log.info("- Parameters : {}", mapRequest.get("ALL_QUERY_PARAMS"));
Dernière modification le 04/06/2020 - Quillevere.net

Search in this website

fr en rss RSS info Informations