If you cannot install it, use the portable version.
When Java gets updated then
JAVA_HOME
path in Windows needs to be updated accordingly.This is a short guideline for Kettle: Pentaho Data Integration (PDI) – mainly with Spoon – the development environment. First read general information about Pentaho platform and PDI.
Since Hitachi acquired Pentaho, the development of the Pentaho platform has stagnated, and the platform's website is confusing. The original author of PDI (Matt Casters) created Apache Hop, a fork of PDI that finally has elegant architecture and is being intensively developed. PDI transformations can be imported into it.
Spoon.bat
. In case of problems: videoSymbol | Name | Description |
---|---|---|
Text file input | Use for CSV also (not CSV file input that cannot process the whole folder). |
|
Other steps for data input and output from/to databases, other sources (e-mail, local computer, FTP, HTTP), and files (MS Excel, MS Access, ESRI SHP, XML, JSON, YAML, RSS, dBase, ZIP, etc.) | ||
Text file output | Can set huge length and return an error. Solution: do not define length . |
|
Table output | ||
Microsoft Excel Writer | ||
Filter rows | For multiple options, use Switch-Case. | |
Formula | More functions than Calculator . |
|
Calculator | Faster than Formula . |
|
Group by | ||
Select values | ||
Sort rows | Also an option: Only pass unique rows? |
|
Replace in string | ||
Split Fields | ||
Stream lookup | To join two streams (tables) without the need to sort them. | |
Row Normaliser | Type field (name of the new column of categories)Fieldname (input header)Type (values of input categories)New field (output header of values) – needs to be one value for all |
|
Row denormaliser | Key – input categories.The key field (name of the input colums with categories),Group field (what identifies the whole future row – e.g. filename),Target fieldname = Key value (single categories),Value fieldname (name of the input column with values)More: Microsoft Power Query for Excel |
|
Set Variables | In other transformations, this variable can be used as a variable or as a parameter. The parameter can have a default value (taken into effect if the variable is not defined). | |
ETL Metadata Injection | To control the transformations. Combine with Transformation Executor .Best practices. Matt Casters: Parse nasty XLS with dynamic ETL At the end of the article is an example including source codes. Alternative: run the transformation in job and check Execute every input row – video. |
|
Transformation Executor | Every row runs a new transformation. | |
Add Constants | ||
Analytic Query | To involve data from multiple rows. Aggregation. | |
Modified Java Script Value | ||
User Defined Java Expression | ||
Pentaho Reporting Output | Feed and create reports designed in PRD. | |
Add sequence | ||
Regex evaluation | Regular expressions. My examples below. | |
Dummy (do nothing) | Useful for merging streams or to see result of some step (e.g. Filter rows ). |
Other steps:
Symbol | Name | Description |
---|---|---|
START | ||
Transformation | ||
Success | ||
Get a file with FTP | ||
Unzip file | ||
Truncate tables |
Official source with all transformation steps and job entries.
Table Selection of the input files (regex corresponds to the file name)
Description | Reguar expression |
---|---|
.xlsx files | .*\.xlsx |
All files | .* |
Files starting with facts | facts.* |
Table Select part of a text string
Description | Regular expression | Input | Output |
---|---|---|---|
Between brackets | .*\((.*)\).* | stanice: ČK (9 m n.m) | 9 m n.m |
Up to , ,; or . | ([^\s?:(?!;).]+).* | ||
Up to , , ; , . or similar | ([^\s]+).* |
${Internal.Entry.Current.Directory}
)Date Format Lenient
or Lenient number conversion
if data type is not resolved properly or returns error.Spoon.bat
in a Windows environment nothing happens. How can I solve it?Spoon.bat
file and:start javaw
with only java
.pause
in the next line.simple-jndi
, which contains a file named jdbc.properties
. You should change this file, so the JNDI information matches the one you use in your application server.ROLDÁN, María Carina, 2017. Learning Pentaho Data Integration 8 CE : Third Edition. Packt Publishing. ISBN 978-1-78829-007-4.
JAVA_HOME
path in Windows needs to be updated accordingly.