Storage Data Flow Management based on Machine Learning

SMML Project

Hybrid data storage systems, if utilized correctly, can be instrumental in meeting the increasing data storage and I/O demands of modern large-scale data analytics and HPC workloads. However, the complexity of data movement across the storage tiers and caches increases significantly, making it harder for applications to take advantage of the higher I/O performance offered by the system. The general objective of this project is to automate data flow management for caching and storage tiering in hybrid data storage solutions, while relying on newly-developed artificial intelligent algorithms, in order to achieve optimal performance to cost ratio under different capacity storage media. The proposed methodology involves combining mathematical modeling with streaming machine learning for the first time, for guiding the decisions of data storage systems. The expected outcome of the project will be instrumental in meeting the increasing data storage and I/O demands of modern large-scale data analytics and high performance computing workloads, as it could offer sustainable high performance with lower cost. This project is done in collaboration with the Huawei Data Algorithm Technology Center of Huawei Russian Research Institute.

Distributed Tiered Storage for Cluster Computing

OctopusFS Architecture

Improvements in memory, storage devices, and network technologies are constantly exploited by distributed systems in order to meet the increasing data storage and I/O demands of modern large-scale data analytics. We present OctopusFS, a novel distributed file system that is aware of storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The system offers a variety of pluggable policies for automating data management across both the storage tiers and cluster nodes. A new data placement policy employs multi-objective optimization techniques for making intelligent data management decisions based on the requirements of fault tolerance, data and load balancing, and throughput maximization. Moreover, machine learning is employed for tracking and predicting file access patterns, which are then used by data movement policies to decide when and which data to move up or down the storage tiers for increasing system performance. This approach uses incremental learning along with XGBoost to dynamically refine the models with new file accesses and improve the prediction performance of the models. At the same time, the storage media are explicitly exposed to users and applications, allowing them to choose the distribution, placement, and movement of replicas in the cluster based on their own performance and fault tolerance requirements.

Scaling Transactional Databases with Strong Guarantees

Transaction Mgmt Diagram

Database replication is a common mechanism used for scaling performance and improving availability of transactional databases but past approaches have suffered from various issues including limited scalability, performance versus consistency tradeoffs, and requirements for database or application modifications. Hihooi is a replication-based middleware solution that is able to achieve workload scalability, strong consistency guarantees, and elasticity for existing transactional databases at a low cost. A novel replication algorithm enables Hihooi to propagate database modifications asynchronously to all replicas at high speeds, while ensuring that all replicas are consistent. At the same time, a fine-grained routing algorithm is used to load balance incoming transactions to available replicas in a consistent way. This project is done in collaboration with Dr. Michael Sirivianos from Cyprus University of Technology.

Smart Cloud Caching for Data Intensive Applications

SMACC Project

As Cloud computing is gaining popularity among small and medium enterprises, Cloud storage solutions such as Amazon S3 are increasingly utilized for storing, maintaining, and serving application data. Despite the typical high-speed internet connections between applications and Cloud storage, there is still a huge performance gap compared to accessing data from direct-attached memory or even locally attached disks. SMACC is a novel Cloud caching service developed at CUT that can run on application compute nodes (e.g., on Amazon EC2) and cache frequently-used data residing on Amazon S3 into local memory and locally-attached disks (e.g., Amazon EBS) using new smart policies. SMACC also provides an HDFS-compatible API interface, which can be used by big data platforms such as Spark and Hadoop for processing data residing on Amazon S3, while caching data blocks on the various compute nodes for increased performance.

Real-time Aggression Detection on Social Media

Aggression Detection Project

The rise of online aggression on social media is evolving into a major point of concern. Several machine and deep learning approaches have been proposed recently for detecting various types of aggressive behavior. However, social media are fast paced, generating an increasing amount of content, while aggressive behavior evolves over time. We introduce the first practical, real-time framework for detecting aggression on Twitter by embracing the streaming machine-learning paradigm. The framework is designed to be adaptable (its ML classifiers are trained incrementally as they receive new annotated examples), scalable (it can process the entire Twitter Firehose with three machines), and generalizable (it can detect other abusive behaviors such as sarcasm, racism, and sexism in real time). This project is done in collaboration with Dr Nicolas Kourtellis from Telefonica Research, Spain and Dr Despoina Chatzakou from Centre for Research and Technology Hellas, Greece.

Intelligent Vessel Monitoring with AIS

Intelligent Vessel Monitoring with AIS

The Automatic Identification System (AIS) is an automatic tracking system used on ships and by Vessel Traffic Services (VTS) for monitoring vessel movements in real time. AIS signals are sent in regular intervals containing encoded information regarding a vessel, including its unique identification, position coordinates, speed and course over ground, next port destination, and many more. The CUT-AIS Ship Tracking Intelligence Platform is a web-based platform that exploits AIS data signals to provide meaningful representations, graphs, and data analytics to the end user. The platform consumes data in real time from three sources: (i) a base station consisting of a VHF antenna, a receiver, and a Raspberry Pi installed in the premises of CUT; (ii) an AIS stream provided by the Cyprus Shipping Deputy Ministry; and (iii) base stations operated by Tototheo Maritime around the coast of Cyprus. AISafety is another web-based platform for monitoring and visualizing ship traffic in the general area of the Eastern Mediterranean sea using AIS data. The key feature of the platform is generating real-time and valid warnings for incoming and outgoing ships from various areas of interest, as well as for potential ship collisions.

Environmental Quality Monitoring & Analysis

Air Quality Data Monitoring and Analysis

CUT Environmental Monitoring is an intelligence platform that is developed and maintained by DICL in collaboration with Dr. Michalis P. Michaelides from Cyprus University of Technology and enables users to access, extract, and analyze air quality, water quality, and meteorological data. Data can be viewed or extracted through a table dashboard where users can make their requests on specific data that they are concerned about. Moreover, these data can also be viewed through our user-friendly live map. Finally, the platform provides a set of various graphs that present key statistics regarding the collected data. Overall, the CUT Environmental Monitoring platform provides the ability to the end users to monitor parameters related to air and water pollution.

Data-Driven Tourist Destination Marketing

Tourist Destination Marketing Project

This project employs a machine-learning approach to tourist destination marketing campaigns through the analysis of tourists’ reviews from TripAdvisor to identify significant patterns in the data. The proposed methodology combines topic modelling using Structured Topic Analysis with sentiment polarity, information on culture, and purchasing power of tourists for the development Decision Trees (DTs) at different level of granularity. The goal is to identify patterns in tourists’ accommodation experiences and potential reasons for their dissatisfaction and satisfaction, which in turn can improve destination marketing and optimize a destination’s profitability. This project is done in collaboration with Dr Andreas Gregoriades from Cyprus University of Technology and Dr Maria Pampaka from The University of Manchester, UK.

Maritime Cognitive Decision Support System

MARI-Sense Project

The primary general objective of the MARI-Sense project is the integration and adaptation of existing expertise and the development of novel knowledge and skills to develop the MARI-Sense Cognitive Decision Support System for Maritime Activities Planning, Emergency Response and Planning, and Maritime Spatial Planning. The secondary general objective is the development and implementation of strategies for smart, sustainable, and inclusive growth with beneficial impact to the society, technology, and economy powered by the diverse capabilities of members of the quadruple helix and general public. The project is co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (RIF) with a total budget of 1M Euros.

Sea Traffic Management in the Eastern Mediterranean

STEAM Project

The general objective of the STEAM (Sea Traffic Management in the Eastern Mediterranean) project is the efficient management of sea traffic in the Eastern Mediterranean sea, while at the same time ensuring safety and environmental sustainability. More specifically, to develop the Port of Limassol to become (i) a world-class transshipment and information hub adopting modern digital technologies brought to the maritime sector, and (ii) a driver for short sea shipping in the Eastern Mediterranean through enhanced services based on standardized ship and port connectivity. The project is co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (RIF) with a total budget of 1M Euros.

Go to top