The first integrated BigDataOcean platform was released featuring a low fidelity, functional mock-up version of the platform, offering the first set of services and features in accordance with the MVP and the integrated maritime data value chain of BigDataOcean. The first release of the platform was delivered for early assessment by the end users. The implementation activities for the integrated BigDataOcean platform were driven by the outcomes and knowledge obtained during the requirements elicitation and analysis, the design of the conceptual architecture of the platform, as well as the definition and design of the platform components. Building upon the BigDataOcean infrastructure, the BigDataOcean services were delivered towards the aim of providing the first integrated BigDataOcean platform.
The cornerstone of the BigDataOcean platform is the underlaying BigDataOcean infrastructure. The BigDataOcean infrastructure is supported by a cloud infrastructure with advanced virtualisation capabilities, enabling the development and deployment of the BigDataOcean platform while also ensuring security and identity trust, as well as enhanced privacy. The BigDataOcean infrastructure consists of three main elements. At the heart of the BigDataOcean infrastructure lays the software platform for cloud computing facilitating the deployment of virtual servers and other instances, while also managing the available resources of the infrastructure. The Big Data Storage of the BigDataOcean platform is supported by well-established storage solutions ensuring data integrity and correctness with high reliability and high availability, as well as high scalability, fault-tolerance and high throughput access for efficient parallel processing. Finally, the available cluster-computing framework is offering a powerful data processing and query processing engine capable of executing advanced, complex or even interactive queries, as well as batch and stream processing workloads, with implicit data parallelism and fault-tolerance. The deployment of the BigDataOcean infrastructure consists of 7 virtual machines in total with the proper allocated hardware in order to support the development and integration of the various BigDataOcean platform’s components and services.
To assure the quality of the integration process of the BigDataOcean platform several tools and techniques were identified and utilised based on the level of maturity and the benefits they will offer to the integration process. For the source code versioning and management Git version control system of GitLab was selected. Additionally, GitLab is also used for issue tracking and continuous integration support with the specialised tools that is offering such as the GitLab CI server. To safeguard the continuous integration process automated building tools, such as Maven , as well as powerful well-established testing frameworks like TestNG and unittest are used.
For the development of the integrated BigDataOcean platform it was decided by the consortium to adopt an incremental approach and with this in mind an integration strategy and plan was defined. The four main versions of the integrated platform will be released following an iterative process, where each release will extend the set of offered functionalities, besides addressing the issues identified in the previous release with a set of updates and refinements. Each release will follow the same integration cycle containing the following steps: a) the definition and updates of the components of the architecture, b) the identification of the dependencies between the components, c) the definition of the integration points that will resolve the identified dependencies and the functionalities that each component will implement towards this end, d) the integration plan where all the previous steps are taken into account and the intermediate releases of the components are scheduled towards the smooth system integration. As a consequence, each integration cycle consists of the following activities: a) the responsible partner undertakes the necessary actions for the implementation, as well as for the testing (via unit test), of the component based on the designs and the specifications, b) the released components and their corresponding integration tests are triggered by the automated continuous integration tool, GitLab CI server, and from the failing tests new issues are inserted in the issue tracking tool, c) as soon as the integration results are successful, the system integration is executed where all produced components are deployed on the BigDataOcean infrastructure and system testing is performed towards the release of the new version of the platform.
In the first version of the integrated BigDataOcean platform the main services were implemented with the aim of providing the envisioned BigDataOcean’s platform functionalities and addressing the needs of the different stakeholder groups identified by the consortium. The list of available services in the first version of the integrated BigDataOcean platform is composed by the Landing Page – Login, the Query Designer, the Dashboard Builder, the Service/Analytics Builder and the Published Services. In the following paragraphs the services of the first integrated BigDataOcean platform are presented.
Landing Page – Login
The landing page is introducing the users into the BigDataOcean platform by presenting the offering of the platform. This page contains information about the available services of the platform focusing on highlighting the benefits that can be obtained for the different stakeholder groups. In this page the user can also find additional information about the platform and the project. Finally, this page is providing the sign-up/login point of the platform, where the users can sign-up to the platform or login to the platform by providing their credentials.
Once the user is logged, the user is directed into the home page of the platform where the user is guided through the services offered by the platform following a user-friendly navigation or search for services related to his/her activities. Through this easy-to-use navigation, the user can access the platform’s analytics toolbox in order to create queries, analyses or dashboards.
The Query Designer is offering an intuitive graphical interface that facilitates custom query creation with user-friendly tools, eliminating the need of writing any SQL and at the same time offering rich expressive capabilities. Within the query designer’s environment, the user is able to build new queries or access previously saved queries, review the results of a query in raw or chart format, and apply filters on the results.
At first, the user can initiate the design of a new query or retrieve the list of saved queries in order to execute them again or modify them before execution. During the query design the user is presented with the list of available datasets and their respective variables. The user is able select the desired variables that will be included in the query and is also able to add any “group-by” and aggregation functions on the query. Once the variables are selected and the query is executed the results will be available in two formats. In the first format, the results are presented as raw data where the variable values are displayed along with the related dimensions. In the second format, the results are visualised on a suitable chart, according to the variable type, providing a better overview of the results of the query execution. Additionally, if multiple variables selected and the “group-by” and aggregation functions were added during the query design the platform will try to “join” the variable on all their common dimensions. Finally, the user can introduce filters that will be applied on the query results. The filters are created through a graphical interface where filter expressions can be set on every column, while multiple filters can be created and can be applied at the same time. Figure 2 and Figure 3 illustrate the Query Designer’s environment and query execution results visualisation.
The Dashboard Builder is enabling the dynamic dashboard and report creation for the users. Within dynamic dashboards multiple visualisations can be created, from a predefined list of available visualisation types that include both charts and maps, on top of the user defined queries. These custom dashboards can be saved and accessed later at any time.
Each visualisation is created by selecting a query from the list of saved queries or by creating a new query. The visualisation type is selected from the list of available visualisation types and the proper arguments for the visualisation are set. Once the visualisation is created, it can be added to the dashboard. In addition to common chart types, the user can also create visualisations that present data on maps, either using coloured contours or point markers. Figure 4 presents an example of points on map visualisation and Figure 5 illustrates a dashboard creation consisting of column chart and contours on map visualisations.
The Service and Analytics Builder is enabling the creation of custom services based on the set of available analytical components/algorithms offered by the BigDataOcean platform. The analytical components are configurable processes defined in the platform that the user can select in order to create a custom chain of analyses to be performed by the platform over the selected data. The Service and Analytics Builder offers a user-friendly graphical environment where the user can create an analytical flow based on the selected components. For each selected component the user is able to provide a custom configuration via a set of parameters depending on the component. When multiple components are sequentially selected then a logic flow of the analysis is created and visualised. The analyses are performed in a sequential way creating a custom chain of analyses. For example, pre-processing function can be applied before a model can be trained to estimate a set of parameters. Once the created analytical flow is executed the results of the analysis are presented to the user indicating also the defined configuration.
Published Services and Dashboards
BigDataOcean platform enables the creation of queries, analyses and dynamic dashboards from the users. The users can choose to publish them and make them available to the rest of the users of the platform either for free or through a subscription plan or a fee. BigDataOcean platform is enabling the users of the platform to browse through the published services and dashboards, in order to discover and retrieve the ones addressing their needs. BigDataOcean platform is facilitating this by presenting to the users the list of public or services and dashboards owned by them and the list of publicly available ones on the platform. The following figure illustrates how the user can manage and browse the published services and dashboards of the BigDataOcean platform.