To get started with familiarizing yourself with the project, please check our docs on how to Get Started.
Machine learning models are GIGO - garbage in garbage out. That is, they are only as good as the data they are built on. As data scientists, we rely heavily on storage system SMEs to provide domain knowledge about what pieces of data can be powerful indicators of failure. For any feedback you want to provide as an SME, you can open an issue on the project repo, and prefix the issue title with
Currently, we have several ML sub-projects/workstreams in progress. The main ones are as follows:
- Hard Drive Failure Prediction
- SMART Metric Forecasting - Issues #11, #14, #15, #17, #18
- Exploring Ceph Telemetry Dataset - Issues #29, #32, #33
- Exploring FAST Dataset - Issues #43, #44, #45
- Disk Health Analytics Module - Issues #27, #28
To learn more about these ML workstreams, please check out the linked issues or the content docs. If you wish to work on any of these workstreams, please leave a comment on the respective issue or create an issue if one doesn’t exist. Once you have been assigned to the issue, you can work on it and submit a Pull Request to the project repo.