The first integrated BigDataOcean platform was released featuring a low fidelity, functional mock-up version of the platform, offering the first set of services and features in accordance with the MVP and the integrated maritime data value chain of BigDataOcean. The first release of the platform was delivered for early assessment by the end users. The implementation activities for the integrated BigDataOcean platform were driven by the outcomes and knowledge obtained during the requirements elicitation and analysis, the design of the conceptual architecture of the platform, as well as the definition and design of the platform components. Building upon the BigDataOcean infrastructure, the BigDataOcean services were delivered with the aim of providing the first integrated BigDataOcean platform.
The cornerstone of the BigDataOcean platform is the underlying BigDataOcean infrastructure. The BigDataOcean infrastructure is supported by a cloud infrastructure with advanced virtualisation capabilities, enabling the development and deployment of the BigDataOcean platform while also ensuring security and identity trust, as well as enhanced privacy. The BigDataOcean infrastructure consists of three main elements. At the heart of the BigDataOcean infrastructure lies the software platform for cloud computing, facilitating the deployment of virtual servers and other instances, while also managing the available resources of the infrastructure. The Big Data Storage of the BigDataOcean platform is supported by well-established storage solutions ensuring data integrity and correctness with high reliability and high availability, as well as high scalability, fault-tolerance and high throughput access for efficient parallel processing. Finally, the available cluster-computing framework offers a powerful data processing and query processing engine capable of executing advanced, complex or even interactive queries, as well as batch and stream processing workloads, with implicit data parallelism and fault-tolerance. The deployment of the BigDataOcean infrastructure consists of 7 virtual machines in total with the proper allocated hardware in order to support the development and integration of the various BigDataOcean platform’s components and services.
To assure the quality of the integration process of the BigDataOcean platform several tools and techniques were identified and utilised based on the level of maturity and the benefits they would offer to the integration process. For source code versioning and management, the Git version control system of GitLab was selected. Additionally, GitLab is also used for issue tracking and continuous integration support,through specialised tools that it offers like GitLab CI server. To safeguard the continuous integration process automated building tools, such as Maven , as well as powerful well-established testing frameworks like TestNG and unittest are used.
For the development of the integrated BigDataOcean platform it was decided by the consortium to adopt an incremental approach and with this in mind an integration strategy and plan was defined. The four main versions of the integrated platform will be released following an iterative process, where each release will extend the set of offered functionalities and address the issues identified in the previous release with a set of updates and refinements. Each release will follow the same integration cycle containing the following steps: a) the definition and update of architectural components, b) the identification of dependencies between components, c) the definition of integration points, that will resolve the identified dependencies and the new functionalities that each component will implement, d) the integration plan, where all the previous steps are taken into account and the intermediate releases of the components are scheduled,with the goal of smooth system integration. As a consequence, each integration cycle consists of the following activities: a) the responsible partner undertakes the necessary actions for the implementation and, testing (via unit test), of the component, based on the designs and the specifications, b) the released components and their corresponding integration tests are triggered by the automated continuous integration tool (GitLab CI server) and from the failing tests new issues are inserted in the issue tracking tool, c) as soon as the integration results are successful, the system integration is executed, where all produced components are deployed on the BigDataOcean infrastructure and system testing is performed for the the release of the new version of the platform.
In the first version of the integrated BigDataOcean platform the main services were implemented with the aim of providing the envisioned BigDataOcean platform functionalities and addressing the needs of the different stakeholder groups identified by the consortium. The list of available services in the first version of platform is as follows: the Landing Page – Login, the Query Designer, the Dashboard Builder, the Service/Analytics Builder and the Published Services. In the following paragraphs the services of the first integrated BigDataOcean platform are presented.
Landing Page – Login
The landing page introduces the users into the BigDataOcean platform by presenting the platform’s various offerings. It contains information about the available services of the platform, focusing on highlighting the benefits for each of the different stakeholder groups. In this page the user can also find additional information about the platform and the project. Finally, this page provides the sign-up/login point of the platform, where the users can sign-up to the platform or login to the platform by providing their credentials.
Once the user is logged in, they are directed to the home page of the platform, where they are guided through the services offered by the platform,after a user-friendly search for services related to their activities. From this easy-to-use central location, the user can access the platform’s analytics toolbox in order to create queries, analyses or dashboards.
Query Designer
The Query Designer offers an intuitive graphical interface that facilitates custom query creation with user-friendly tools, eliminating the need for writing any SQL and at the same time offering rich expressive capabilities. Within the query designer’s environment, the user is able to build new or access previously saved queries, review the results of a query in raw or chart format, and apply filters on the results.
At first, the user can initiate the design of a new query or retrieve the list of saved queries in order to execute them again or modify them before execution. During the query design the user is presented with the list of available datasets and their respective variables. The user is able select the desired variables that will be included in the query and is also able to add any “group-by” and aggregation functions on the query. Once the variables are selected and the query is executed the results will be available in two formats. In the first format, the results are presented as raw data where the variable values are displayed along with related dimensions. In the second format, the results are visualised on a suitable chart, according to the variable type, providing a better overview of the results of the query execution. Additionally, if multiple variables were selected and the “group-by” and aggregation functions were added during the query design, the platform will try to “join” the variable on all their common dimensions. Finally, the user can introduce filters that will be applied on the query results. The filters are created through a graphical interface where filter expressions can be set on every column, while multiple filters can be created and applied at the same time. Figure 2 and Figure 3 illustrate the Query Designer’s environment and query execution results visualisation.
Dashboard Builder
The Dashboard Builder enables the dynamic dashboard and report creation for the users. Within dynamic dashboards multiple visualisations can be created, from a predefined list of available visualisation types that include both charts and maps, on top of the user defined queries. These custom dashboards can be saved and accessed later at any time.
Each visualisation is created by selecting a query from the list of saved queries or by creating a new query. The visualisation type is selected from the list of available visualisation types and the proper arguments for the visualisation are set. Once the visualisation is created, it can be added to the dashboard. In addition to common chart types, the user can also create visualisations that present data on maps, either using coloured contours or point markers. Figure 4 presents an example of points on map visualisation and Figure 5 illustrates a dashboard creation consisting of column chart and contours on map visualisations.
Service/Analytics Builder
The Service and Analytics Builder enables the creation of custom services based on the set of available analytical components/algorithms offered by the BigDataOcean platform. The analytical components are configurable processes defined in the platform that the user can select in order to create a custom chain of analyses to be performed by the platform over the selected data. The Service and Analytics Builder offers a user-friendly graphical environment where the user can create an analytical flow based on the selected components. For each selected component, the user is able to provide a custom configuration via a set of parameters (depending on the component). When multiple components are sequentially selected, a logic flow of the analysis is created and visualised. The analyses are performed in a sequential way creating a custom chain of analyses. For example, a pre-processing function can be applied before a model is trained to estimate a set of parameters. Once the newly-created analytical flow is executed the results of the analysis are presented to the user (indicating as well the defined configuration).
Published Services and Dashboards
The BigDataOcean platform enables the creation of queries, analyses and dynamic dashboards by the users. The users can choose to publish and make them available to the rest of the users of the platform either for free or through a subscription plan/fee.The BigDataOcean platform enables the users of the platform to browse through the published services and dashboards, in order to discover and retrieve the ones addressing their needs.The platform facilitates this by presenting to the users the list of their privately held services and dashboards, as well as those that are made publicly available within the platform’s scope. The following figure illustrates how the user can manage and browse the published services and dashboards of the BigDataOcean platform.