Environmental data management and analytics with Pentaho
Published originally on it-novum before Pentaho Community Meeting 2017.
An indoor hydrogeologist taking samples: Kamil Nešetřil
Analyzing environmental data helps to understand the availability of natural resources better. Technical University of Liberec runs a data warehouse that integrates data from different sources of environmental information. Kamil Nešetřil will present this exciting project at Pentaho Community Meeting and show how data warehouses can be built from semi-structured data.
Kamil, who are you?
I wanted to work outdoor, so I studied geology and hydrogeology. However, I ended up as a groundwater modeler – I call myself an “indoor hydrogeologist”. I work at the Technical University of Liberec in the city of Reichenberg in the Czech Republic in interdisciplinary teams. We have IT students, but research focuses on groundwater and the environment. I have moved forward to IT, and I am connecting both worlds.
What is your connection to Pentaho?
Building a groundwater model requires integrating knowledge from scattered data (reports, spreadsheets, maps), understand the processes (where does water flow), and create the conceptual model (simplify this understanding). Finally (if there is enough time), one can build and evaluate the groundwater model. A simple model can be just a formula in a spreadsheet.
I wanted to use a domain-specific software to help me with data integration, visualization, and reporting. I have reviewed such tools, but I decided to develop a Pentaho-based solution. Integrating data from diverse sources can be easier done in PDI than in a domain-specific data-management tool and designing a report takes the same effort with domain-specific tool as with Pentaho. I am not a programmer so I enjoy Pentaho design tools and the fact that somebody designed the architecture of the whole system.
What will your talk be about?
I will speak about the Hydrogeological Information System (dataearth.cz) that is based on the Pentaho Platform. There are several data warehousing projects for environmental data; some environmental data management systems use existing reporting tools. But we have the first environmental application of the full BI stack. I will show how do we build a data warehouse from semi-structured data.
Why are projects like dataearth so important?
Our solution takes tools and concepts from BI into the world where are generally only Excel files and where even IT guys do not know what BI means. There is a challenge in diversity (long-tail data), not volume (big data).