Table of Contents
System Architecture
This is the technical description of HgIS. It is not very useful for users not interested in information technology.
The architecture of the system corresponds to a spatial business intelligence solution (GeoBI) – a combination of BI and GIS.
Analysis
The data needed to develop groundwater models are of various types. They are time-dependent and fully 3D. The source data exist in various formats as databases, data exchange formats (e.g. DBF, XML or flat files), archive data (e.g., MS Excel, or MS Word), spatial data (e.g., ESRI SHP, KML, or geodatabases). These data need to be retrieved into a single data structure to be used together. Highly structured data usually do not contain any interpretation or additional knowledge. Therefore it is necessary to store and process all types of data adequately. Some should be saved in a structured form so it can be further used (creating graphs, tables, maps, cross sections, etc.). Other data are used ad-hoc in the form it was obtained in, so it is sufficient just to be stored and accessible – e.g. in the file system. The data and processes are depicted in table 1 that stands for the data flow diagram.
Table 1. Data flows – data sorted from structured to unstructured
Data source | Storage | Usage | Content |
---|---|---|---|
Structured and semi-structured data – observations (databases, files) | Data warehouse | Reporting, visualization incl. geological profiles and cross sections, export | Data |
Spatial interpretation of data, other geodata | Standard-based storage | Maps, GIS | Information |
Documents | Stored with metadata | Searching, citing | Data, information, knowledge |
Other files | Storage, accessibility | Ad hoc |
Business Intelligence
Groundwater information management can be described as loading of both archive and actual data (that are not modified anymore) from diverse (structured and semi-structured) sources; visualization of data in tables and graphs (downloadable in common formats as MS Word and MS Excel), data analysis, and model development. The same description corresponds to a completely different discipline – Business Intelligence (BI), where BI uses data about a company to support its manager’s decision-making. Therefore HgIS utilizes Pentaho – the BI platform. It is a Java-based product of Hitachi Vantara with an open-source version. The Pentaho platform contains ETL – Pentaho Data Integration PDI. Reports designed by Pentaho Report Designer (PRD) can be run on a local computer or on the BI application server Pentaho Server (PBA). PBA facilitates users to design dashboards, analyze OLAP cubes, etc. The Pentaho platform can be easily integrated or embedded into other applications.
HgIS
HgIS is an information system developed at the Technical University of Liberec in the Czech Republic. Its purpose is to load data from the available data sources of any kind, to visualize and analyze data (to support the formulation of alternative conceptual models), and to implement simple models based on the data. Table 2 shows how specific kinds of data are managed in HgIS. Although it is focused on groundwater, it is also being used for the broader range of environmental data.
Table 2. HgIS architecture
Data source | -> | Storage | -> | Usage |
---|---|---|---|---|
Observations (XML, MS Excel, flat files, SQL databases) | ETL (PDI) | Data warehouse (PostgreSQL) | BI Platform (Pentaho) | Reporting, visualization, procedural models |
ETL (PDI) | Complex visualization | |||
Spatial data (ESRI SHP, KML, raster images etc.) | GIS (QGIS), ETL (PDI) etc. | Spatial database (PostGIS) + georeferenced images | Map server (QGIS Server) | Online map application, desktop GIS (e.g. QGIS) |
Documents | Reference management software (Zotero) Knowledge base (CMS, ECM) | Search, cite | ||
Other files | File system | Ad hoc |
Figure: HgIS architecture
Database – data warehouse
It is reasonable to use an existing data model for the newly developed information system. We have reviewed available data exchange standards and data models as GWML, other application schemas of GML, INSPIRE, Hg2O, Arc Hydro Groundwater, Data Model of National Groundwater Information System, H+, and some others (see Competitors and Standards for groundwater data exchange). None of those data models was used. Besides some other issues, some were not suitable for the needs of groundwater practitioners; some were too concise or not sufficiently documented. All data models, data exchange formats, and data models of EDMS were reviewed and used as an inspiration for the developed data model. Visualization of the hydrogeological data on a desktop computer can be easily performed with EnviroInsite from EI LLC – low priced software in .NET. It can be used to display maps (including localized tables and graphs), technical documentation of boreholes, geological cross-sections, 3D geological models and interpolation in 2D and 3D. The data model of HgIS is based on an existing data model of EnviroInsite. Therefore the database and the visualization software have consistent data structures that reduce the need for non-unique data transformation, and so it does not confuse users. The original data model of EnviroInsite (9 tables) was extended to 36 tables because the EnviroInsite data model contains the data relevant for visualization only. The original tables were extended by additional fields and the model was further normalized. The data model contains data on:
- observation objects*,
- characterization of geological layers,
- technical construction of wells,
- definition of observed quantities*,
- action levels,
- definition of vertical intervals*,
- measurements tied to vertical intervals (e.g. chemical assays or head measurements)*,
- measurements tied to a specific depth (e.g. geophysical logging),
- sampling conditions,
- conversion of units (e.g. mg to g) and quantities (e.g. nitrate to nitrogen),
- anti-aliasing,
- time intervals,
- metadata,
- lookup tables etc.
Tables containing data noted with asterisk (*) are organized to the snowflake schema.
We are using the PostgreSQL database management system. Interpretations and non-point data (arcs, polygons, etc.) are stored in PostgreSQL due to the spatial extension PostGIS. That spatial data and georeferenced images are served via QGIS Server as the mapping service (e.g. WMS or WMTS), that can be loaded as a basemap to our web application or to any GIS.