Loading...
Quick generation of SSD performance models using machine learning
Tarihi, M ; Sharif University of Technology | 2022
83
Viewed
- Type of Document: Article
- DOI: 10.1109/TETC.2021.3116197
- Publisher: IEEE Computer Society , 2022
- Abstract:
- Increasing usage of Solid-State Drives (SSDs) has greatly boosted the performance of storage backends. SSDs perform many internal processes such as out-of-place writes, wear-leveling, and garbage collection. These operations are complex and not well documented which make it difficult to create accurate SSD simulators. Our survey indicates that aside from complex configuration, available SSD simulators do not support both sync and discard requests. Past performance models also ignore the long term effect of I/O requests on SSD performance, which has been demonstrated to be significant. In this article, we utilize a methodology based on machine learning that extracts history-aware features at low cost to train SSD performance models that predict request response times. A key goal of our work is to achieve real-time or near-real time feature extraction and to achieve practical training times so our work can be considered as part of solutions that perform online or periodical characterization such as adaptive storage algorithms. Thus, we extract features from individual read, write, sync, and discard I/O requests and use structures such as exponentially decaying counters to track past activity using O(1)O(1) memory and processing cost. To make our methodology accessible and usable in real-world online scenarios, we focus on machine learning models that can be trained quickly on a single machine. To massively reduce processing and memory cost, we utilize feature selection to reduce feature count by up to 63%, allowing a feature extraction rate of 313,000 requests per second using a single thread. Our dataset contains 580M requests taken from 35 workloads. We experiment with three families of machine learning models, a) decision trees, b) ensemble methods utilizing decision trees, and c) Feedforward Neural Networks (FNN). Based on these experiments, FNN achieves an average R2-R2 score of 0.72 compared to 0.61 and 0.45 for the Random Forest and Bagging, respectively, where R21)R2 (-inf,1) of 1 indicates a perfect fit. However, while the random forest model has lower accuracy, it uses general processing hardware and can be trained much faster, making it viable for use in online scenarios. © 2013 IEEE
- Keywords:
- Machine learning ; Neural networks ; Performance prediction ; Solid state drives ; Extraction ; Feature extraction ; Feedforward neural networks ; Forestry ; Learning algorithms ; Learning systems ; Scheduling algorithms ; Computational modelling ; Features extraction ; Machine-learning ; Neural-networks ; Performance Modeling ; Performance prediction ; Performances evaluation ; Predictive models ; Solid state drive ; Time factors ; Decision trees
- Source: IEEE Transactions on Emerging Topics in Computing ; Volume 10, Issue 4 , 2022 , Pages 1821-1836 ; 21686750 (ISSN)
- URL: https://ieeexplore.ieee.org/document/9557842