This is a short guideline for Kettle: Pentaho Data Integration (PDI) – mainly with Spoon – the development environment. First read general information about Pentaho platform and PDI.
Since Hitachi acquired Pentaho, the development of the Pentaho platform has stagnated, and the platform's website is confusing. The original author of PDI (Matt Casters) created Apache Hop, a fork of PDI that finally has elegant architecture and is being intensively developed. PDI transformations can be imported into it.
Spoon.bat. In case of problems: video
|Text file input|| Use for CSV also (not
|Other steps for data input and output from/to databases, other sources (e-mail, local computer, FTP, HTTP), and files (MS Excel, MS Access, ESRI SHP, XML, JSON, YAML, RSS, dBase, ZIP, etc.)|
|Text file output|| Can set huge
|Microsoft Excel Writer|
|Filter rows||For multiple options, use Switch-Case.|
|Formula|| More functions than
|Calculator|| Faster than
|Sort rows|| Also an option:
|Replace in string|
|Stream lookup||To join two streams (tables) without the need to sort them.|
|Row denormaliser|| Key – input categories.
More: Microsoft Power Query for Excel
|Set Variables||In other transformations, this variable can be used as a variable or as a parameter. The parameter can have a default value (taken into effect if the variable is not defined).|
|ETL Metadata Injection|| To control the transformations. Combine with
Matt Casters: Parse nasty XLS with dynamic ETL
At the end of the article is an example including source codes.
Alternative: run the transformation in job and check Execute every input row – video.
|Transformation Executor||Every row runs a new transformation.|
|Analytic Query||To involve data from multiple rows. Aggregation.|
|Modified Java Script Value|
|User Defined Java Expression|
|Pentaho Reporting Output||Feed and create reports designed in PRD.|
|Regex evaluation||Regular expressions. My examples below.|
|Dummy (do nothing)|| Useful for merging streams or to see result of some step (e.g.
|Get a file with FTP|
Table Selection of the input files (regex corresponds to the file name)
|Files starting with ||
Table Select part of a text string
|stanice: ČK (9 m n.m)||9 m n.m|
|Up to |
|Up to |
Date Format Lenientor
Lenient number conversionif data type is not resolved properly or returns error.
Spoon.batin a Windows environment nothing happens. How can I solve it?
start javawwith only
pausein the next line.
simple-jndi, which contains a file named
jdbc.properties. You should change this file, so the JNDI information matches the one you use in your application server.
ROLDÁN, María Carina, 2017. Learning Pentaho Data Integration 8 CE : Third Edition. Packt Publishing. ISBN 978-1-78829-007-4.