The final version of the integrated BigDataOcean platform has been released and has been used for the conduction of the third and final evaluation phase. The final version of the platform features all latest versions and upgrades of all technical components, and offers additional functionalities and improved performance, enhancing the offerings of the platform and the realisation of the integrated maritime data value chain of BigDataOcean. To achieve this, the feedback received from the assessment of the end users on the previous release of the platform (Release 3.00) was thoroughly analysed and the outcome of this analysis was translated into a series of necessary improvements and refinements on both the backend and the frontend services of the platform in order to ensure that the offered functionalities are addressing the stakeholders’ needs.
BigDataOcean Back-End Services
In the final version of the BigDataOcean platform, the integrated backend services have been enhanced and/or fine-tuned, with the aim of providing the necessary backend functionalities that are being utilised by the frontend services in order to implement the envisioned platform’s offerings that address the needs of the different stakeholder groups. The list of the backend services includes the Data Ingestion, the Query Execution, the Service Execution and Building, the Visualisation Generation, the Dashboard Creation and Display and the Access Control Service.
The Data Ingestion is providing the automated mechanism that enables the automatic retrieval, semantic enrichment and parsing of the incoming datasets. It is composed of four services, namely the File Handler, the Vocabulary Repository, the Harmonisation Tool and the File Parser services.
The BigDataOcean File Handler is responsible for retrieving the new raw datasets (via FTP or HTTP) in a preconfigured time range and storing them in the HDFS file system. An important update since the 3rd version of the file handler is the fact that the list of available schedulers has been expanded to support fetching near-real time AIS data and sensor measurements from IoT sensors. Once a new dataset is available, the Harmonisation Tool is informed and the semantic enrichment of the new raw dataset is started. The Harmonisation Tool utilises the metadata profile for the selected data source and automatically produces the appropriate metadata, describing both the metadata and the datasets themselves, using the metadata vocabularies available in the Vocabulary Repository. An important update since the 3rd version of the Harmonisation Tool, is the fact that the available metadata are automatically translated to the English language using Yandex API. Once the semantic enrichment is completed, one of multiple running instances of the File Parser, is informed and the parsing process is started. The parsed and normalised information is stored in Hive storage through Presto and it is available for query process and analysis. The whole process is orchestrated by Apache Kafka and the various steps of the process are interconnected in an asynchronous way with two Kafka topics, one containing information for the new available datasets without metadata and one containing information for the new semantically-enriched datasets that have not been parsed yet. The BigDataOcean’s automated data ingestion process is illustrated in the following figure.
The Query Execution enables the expression and execution of both simple and complex queries that combine information from different datasets that are available at the platform’s storage, while offering in addition some basic information about the stored datasets and variables.
The Service Execution and Building offers the ability of executing advanced analytical services over big and diverse datasets coming from multiple sources. The service execution utilises Apache Spark 2.2 and its Python or Scala API and several libraries such as MLib and Weka. The code of the services are written and maintained in Apache Zeppelin Notebooks and Apache Livy is utilised for remote execution of Spark jobs over a REST API.
The Visualisation Generation is enabling the generation of different visualisations that can be viewed at multiple areas within the platform, either when a user is exploring the available datasets, viewing a dashboard or executing a service. The visualisation is divided into three groups: a) Chart visualisations such as line charts, column charts, pie charts, histograms and time series charts, b) Map visualisations such as plotlines, polygons, contours on map, heatmaps and markers on map and c) Miscellaneous visualisations such as data tables and aggregate values. An important update since the 3rd version of the Visualisation Generation is the fact that Histogram 2D and Live AIS Visualisation were implemented and added to the Query Designer and Dashboard Builder environments.
The Dashboard Creation and Display enables the creation of customised dashboards/ reports that constitute of multiple components (widgets) such as visualisations, notes containing text, images and more in a custom layout defined by the user. A dashboard object is created and stored and upon receiving a display request is retrieved and displayed to the user.
The Access Control Service is providing the access control mechanism of the BigDataOcean platform. Within the context of the platform, the Attribute-Based-Access-Control model is adopted, in which access to Resources (different datasets, dashboards and services of the platform) is controlled by evaluating rules (policies) against the attributes of the Subject (users), the Actions and the environment relevant to a request. An important update since the 3rd version of the Access Control Service is the implementation of the new mechanism for creating access requests to the different resources of the platform, along with the mechanism for accepting or declining these requests.
BigDataOcean Front-End Services
In the final version of the integrated BigDataOcean platform, a complete list of the foreseen frontend services has been implemented with the aim of providing the different environments and tools of the platform that will be exploited by the different stakeholder groups. This list of front-end services is graphically illustrated in the following figure:
The Landing Page, Dataset Exploration & Metadata is introducing the users into the offerings of the platform upon successful login. Within this page, the user can navigate to the BDO Datasets by using the search bar in the centre of the page, use the Query Designer Tool using the “Explore and Create” link or see the available applications and dashboards. When using the search bar to navigate to the datasets, the user can see aggregated information for every one of them and use the filters on the left side of the page to find the datasets of his interest. Important updates since the 3rd version of the Landing Page, Dataset Exploration & Metadata front-end-service include the additional filtering and sorting options when searching for the available datasets, the display of detailed information about the kind of variables stored in each dataset, and the ability to request access for a private dataset.
In the User Profile page, the user is able to view and edit his profile basic information by clicking the profile icon on the right side of the top navigation bar. The user is able to provide additional information about his own profile such as the organisation the user is associated with, the user’s business role and more.
By selecting the Query Designer’s environment, the user is able to explore the available datasets of the platform and create custom queries with rich expressive capabilities, through an intuitive graphical interface, without writing any SQL. At first, the user selects the dataset to query from the list of all available datasets or uses the search filter in order to find a specific dataset, to search datasets by variable or publisher using the related dropdown lists or to filter datasets with time or depth information using the related checkboxes. The layout of the Query Designer environment has been updated, in order to be more user-friendly and easy-to-use. In addition, More options have been made available since the previous version, like the ability to see only updating datasets, sort them by different fields and also filter them based on their spatial and temporal coverage. Moreover, there is additional information for each of the displayed datasets, such as metadata, and the new “COVERAGE” tab that contains coverage information. After performing the selection of the dataset to be used and the variables to be queried, this information is passed to the main Query Designer environment.
In order to better guide the users for using the tool, a set of pop-overs are displayed that provide them with information about the abilities offered by it or how to perform some actions, like running the created query, or selecting more variables. After the execution is finished, the raw result data are listed. In the case of performing a query that combines variables from different datasets, there is a possibility that the two datasets do not match on their dimensions, like space and time. This may result in the very time-consuming process of examining the whole Cartesian product of the datasets. In order to avoid this, before the actual execution, an initial check is performed on the metadata of the datasets and the user is informed with a message about whether the query can be executed or not. In the “CHART” tab of the right panel, there is a list of all the available visualisation types in the platform. The user can click on an item in order to see a form that guides him to configure as he desires the specific visualisation (i.e. select the axes variables). The visualisation configuration fields have been updated and an adjustment has been made for some visualisations that must accept only numerical fields of the query, by utilising the available metadata of the variables. Moreover, the items on the visualisation list are being automatically enabled or disabled, depending on the fields of the query results. For example, if the query results do not contain information about latitude and longitude, the map visualisations are being disabled.
By selecting the Dashboard Builder’s environment, the user is able to create a custom dashboard/ report by adding multiple widgets with visualisations, images, tables and text into a single place. An interface similar to the Query Designer’s environment is utilised in order to add a new visualisation on the dashboard, where the user is prompted to select the data (one of the already saved queries) that he wants to use for the visualisation, the visualisation type and provide the custom configuration. A preview of the visualisation is presented to the user along with the option to add it to the dashboard. Additionally, the user is given the option to include in the dashboard custom notes, text, images, tables and other elements using a WYSIWYG editor. Once all widgets are included in the newly created dashboard, the user is able to resize each widget separately, change the layout and the position of each widget and also provide custom titles in the widget and the whole dashboard. When the user is satisfied with the dashboard he created, he can save it by pressing the “Save Dashboard” button. The user is able to view and edit the saved dashboard at any time. Important updates since the 3rd version of the Dashboard Builder front-end-service include the creation of the interface for deleting an existing dashboard, and of the interface for sharing a dashboard and granting editing permission.
All the available services and the created dashboards of the BigDataOcean platform are gathered in the “Applications & Dashboards” page. At this page the user can see some information and a short description for each service and the user can choose to view one of them. Three tabs are available that include the five pilot applications and the services and dashboards of the current user.