Optimal Stopping Point for CI Tests

One of the machine learning explorations within the OpenShift CI Analysis project is predicting optimal stopping point for CI tests based on their test duration (runtimes) (see this issue for more details). In a previous notebook we showed how to access the TestGrid data, and then performed initial data analysis as well as feature engineering on it. Furthermore, we also calculated the optimal stopping point by identifying the distribution of the test_duration values for different CI tests and comparing the distributions of passing/failing tests.

In this notebook, we will detect the optimal stopping point for different CI tests taken as inputs.

[1]

## Import libraries
import os
import gzip
import json
import datetime
import itertools
import scipy  # noqa F401
from scipy.stats import (  # noqa F401
    invgauss,
    lognorm,
    pearson3,
    weibull_min,
    triang,
    beta,
    norm,
    weibull_max,
    uniform,
    gamma,
    expon,
)

from ipynb.fs.defs.osp_helper_functions import (
    CephCommunication,
    fit_distribution,
    standardize,
    filter_test_type,
    fetch_all_tests,
    best_distribution,
    optimal_stopping_point,
)
import warnings

warnings.filterwarnings("ignore")

Ceph

Connection to Ceph for importing the TestGrid data

[2]

## Specify variables
METRIC_NAME = "time_to_fail"

# Specify the path for input grid data
INPUT_DATA_PATH = "../../data/raw/testgrid_258.json.gz"

# Specify the path for output metric data
OUTPUT_DATA_PATH = f"../../../../data/processed/metrics/{METRIC_NAME}"

## CEPH Bucket variables
## Create a .env file on your local with the correct configs
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = os.getenv("S3_PROJECT_KEY", "metrics")
s3_input_data_path = "raw_data"

# Specify whether or not we are running this as a notebook or part of an automation pipeline.
AUTOMATION = os.getenv("IN_AUTOMATION")

[3]

## Import data
timestamp = datetime.datetime.today()

if AUTOMATION:
    filename = f"testgrid_{timestamp.day}{timestamp.month}.json"
    cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
    s3_object = cc.s3_resource.Object(s3_bucket, f"{s3_input_data_path}/{filename}")
    file_content = s3_object.get()["Body"].read().decode("utf-8")
    testgrid_data = json.loads(file_content)

else:
    with gzip.open(INPUT_DATA_PATH, "rb") as read_file:
        testgrid_data = json.load(read_file)

Fetch all tests

Using the function fetch_all_tests, we will fetch all passing and failing tests into two dataframes.

[4]

# Fetch all failing tests i.e which have a status code of 12
failures_df = fetch_all_tests(testgrid_data, 12)

[5]

failures_df.head()

	timestamp	tab	grid	test	test_duration	failure/passing
8	2021-08-16 23:03:14	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	20.016667	True
10	2021-08-16 00:01:05	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	108.233333	True
22	2021-08-16 23:03:14	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	operator.Run multi-stage test e2e-metal-assist...	13.166667	True
24	2021-08-16 00:01:05	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	operator.Run multi-stage test e2e-metal-assist...	89.983333	True
38	2021-08-16 00:01:05	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	TestInstall_test_install.start_install_and_wai...	60.004001	True

[6]

# Fetch all passing tests i.e which have a status code of 1
passing_df = fetch_all_tests(testgrid_data, 1)

[7]

passing_df.head()

	timestamp	tab	grid	test	test_duration	failure/passing
1	2021-08-23 00:01:04	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	95.300000	True
2	2021-08-22 08:53:17	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	101.800000	True
3	2021-08-20 23:21:32	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	134.833333	True
4	2021-08-20 15:57:36	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	109.833333	True
5	2021-08-20 06:47:40	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	94.800000	True

Filter tests

After collecting the data for all passing and failing tests, we will move towards narrowing down to one test for which we would want to calculate the optimal stopping point. We will be using the test - operator.Run multi-stage test e2e-aws-upgrade - e2e-aws-upgrade-openshift-e2e-test container test and extract the data for this test.

[8]

failures_test = filter_test_type(
    failures_df,
    "operator.Run multi-stage test e2e-aws-upgrade - e2e-aws-upgrade-openshift-e2e-test container test",
)
failures_test.head()

	timestamp	tab	grid	test	test_duration	failure/passing
0	2021-08-25 12:17:53	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	85.866667	True
1	2021-08-25 10:30:05	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	91.916667	True
2	2021-08-25 04:41:24	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	101.133333	True
3	2021-08-24 20:03:02	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	98.450000	True
4	2021-08-24 04:35:23	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	93.216667	True

[9]

passing_test = filter_test_type(
    passing_df,
    "operator.Run multi-stage test e2e-aws-upgrade - e2e-aws-upgrade-openshift-e2e-test container test",
)
passing_test.head()

	timestamp	tab	grid	test	test_duration	failure/passing
0	2021-08-25 13:06:02	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	101.250000	True
1	2021-08-25 07:15:39	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	94.283333	True
2	2021-08-25 06:08:52	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	90.316667	True
3	2021-08-25 02:54:53	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	93.866667	True
4	2021-08-24 22:40:00	"redhat-openshift-informing"	release-openshift-okd-installer-e2e-aws-upgrade	operator.Run multi-stage test e2e-aws-upgrade ...	92.900000	True

Fit Distribution

After extracting the data for one test, we would want to find the best distribution to perform optimal stopping point calculation. We find chi square and p-values to find the best distribution.

[10]

failure_dist, failures_r = fit_distribution(failures_test, "test_duration", 0.99, 0.01)


Distributions listed by Betterment of fit:
............................................
   Distribution     chi_square and p-value
3          beta  (2148.0315961744586, 0.0)
9      pearson3   (2150.964892187448, 0.0)
1          norm   (2178.439189095538, 0.0)
8       lognorm   (2190.171386750302, 0.0)
6         gamma  (2251.5768352345144, 0.0)
0   weibull_min  (2335.2881528000057, 0.0)
2   weibull_max  (2436.7340969950874, 0.0)
4      invgauss  (2581.7529201615253, 0.0)
10       triang   (3168.817214371956, 0.0)
5       uniform  (5205.7686822999685, 0.0)
7         expon   (7308.400793415922, 0.0)

[11]

# Identify the best fit distribution from the failing test along with its corresponding distribution parameters
best_dist, parameters_failing = best_distribution(failure_dist, failures_r)

[12]

# Identify the distributions for the passing test along with its corresponding distribution parameters
passing_dist, passing_r = fit_distribution(passing_test, "test_duration", 0.99, 0.01)


Distributions listed by Betterment of fit:
............................................
   Distribution                         chi_square and p-value
10       triang     (461.9624452114939, 4.799796517458444e-69)
3          beta    (619.2886679176573, 2.716009412709153e-100)
2   weibull_max   (782.1495727872282, 2.4780499811803997e-133)
5       uniform   (800.7205543332755, 3.9377128547833523e-137)
9      pearson3     (902.4903827437414, 4.87937692523532e-158)
0   weibull_min   (961.9366558978377, 2.6033191498811774e-170)
6         gamma   (1025.0253918219983, 2.234474698949537e-183)
8       lognorm  (1063.4355506988115, 2.3726995807065007e-191)
1          norm    (1066.204889931689, 6.306900543032179e-192)
4      invgauss     (1076.96978515332, 3.650894820526847e-194)
7         expon                      (2457.1474484587093, 0.0)

[13]

passing_r.head()

	Distribution	chi_square and p-value
10	triang	(461.9624452114939, 4.799796517458444e-69)
3	beta	(619.2886679176573, 2.716009412709153e-100)
2	weibull_max	(782.1495727872282, 2.4780499811803997e-133)
5	uniform	(800.7205543332755, 3.9377128547833523e-137)
9	pearson3	(902.4903827437414, 4.87937692523532e-158)

[14]

# Identify the best fit distribution from the passing test
best_distribution(passing_dist, passing_r)

('weibull_min', [12.20521715428722, -9.947987307617899, 10.381897325624372])

After finding the best distribution for failing distribution, we find the corresponding parameters for the same distribution in the passing distribution.

[15]

# Find the corresponding passing test distribution parameters for the
# best fit distribution identified from the failing test above
parameters_passing = passing_dist[passing_dist["Distribution Names"] == best_dist][
    "Parameters"
].values
parameters_passing = list(itertools.chain(*parameters_passing))

[16]

# Standardize the features by removing the mean and scaling to unit variance
y_std_failing, len_y_failing, y_failing = standardize(
    failures_test, "test_duration", 0.99, 0.01
)

[17]

# Standardize the features by removing the mean and scaling to unit variance
y_std_passing, len_y_passing, y_passing = standardize(
    passing_test, "test_duration", 0.99, 0.01
)

Optimal Stopping Point Calculation

Let's move forward to find the optimal stopping point for the test by passing the best distribution name, failing and passing distributions and the corresponding distribution parameters.

[18]

osp = optimal_stopping_point(
    best_dist,
    y_std_failing,
    y_failing,
    parameters_failing,
    y_std_passing,
    y_passing,
    parameters_passing,
)

[19]

# Optimat Stopping Point for `operator.Run multi-stage test e2e-aws-upgrade
# - e2e-aws-upgrade-openshift-e2e-test container test`
osp

104.3979969544608

This tells us that the optimal stopping point should be at test duration run length of 104.39 seconds.

Conclusion

In this notebook we were able to:

Fetch the data for all passing and failing tests
Filter the data for the test - operator.Run multi-stage test e2e-aws-upgrade - e2e-aws-upgrade-openshift-e2e-test container test
Find the best distribution for the test
Find the optimal stopping point for the test