Distributed Tiered Storage for Cluster Computing
Improvements in memory, storage devices, and network technologies are constantly exploited by distributed systems in order to meet the increasing data storage and I/O demands of modern large-scale data analytics. We present OctopusFS, a novel distributed file system that is aware of storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. The system offers a variety of pluggable policies for automating data management across both the storage tiers and cluster nodes. A new data placement policy employs multi-objective optimization techniques for making intelligent data management decisions based on the requirements of fault tolerance, data and load balancing, and throughput maximization. Moreover, machine learning is employed for tracking and predicting file access patterns, which are then used by data movement policies to decide when and which data to move up or down the storage tiers for increasing system performance. This approach uses incremental learning along with XGBoost to dynamically refine the models with new file accesses and improve the prediction performance of the models. At the same time, the storage media are explicitly exposed to users and applications, allowing them to choose the distribution, placement, and movement of replicas in the cluster based on their own performance and fault tolerance requirements.
While the use of storage tiering is becoming popular in data-intensive compute clusters, current big data platforms (such as Hadoop and Spark) are not exploiting the presence of storage tiers and the opportunities they present for performance optimizations. Specifically, schedulers and prefetchers will make decisions only based on data locality information and completely ignore the fact that local data are now stored on a variety of storage media with different performance characteristics. We propose Trident, a scheduling and prefetching framework that is designed to make task assignment, resource scheduling, and prefetching decisions based on both locality and storage tier information. Trident formulates task scheduling as a minimum cost maximum matching problem in a bipartite graph and utilizes two novel pruning algorithms for bounding the size of the graph, while still guaranteeing optimality. In addition, Trident extends YARN’s resource request model and proposes a new storage-tier-aware resource scheduling algorithm. Finally, Trident includes a cost-based data prefetching approach that coordinates with the schedulers for optimizing prefetching operations.