ODH Logo

Categorical Encoding Techniques

Categorical variables are string columns in a dataset like product names, alert names, log files keys, and variables in linux configuration files. They need to be handled carefully as they have to be converted to numbers.

In this project, we focus on encoding schemes for nominal categorical variables. These variables have no inherent ordering or trend between different categories, for e.g., weather can be rainy, sunny, snowy, etc. Encoding to numbers is challenging because we want to avoid distorting the distances between the levels or categories of the variables, and also retain explainability. Therefore, we search for encoders that optimally balance the trade-off between performance and explainability.


This project is maintained as part of the AIOps team in Red Hat’s AI CoE as part of the Office of the CTO. More information can be found at https://www.operate-first.cloud/.