Percent of Failing Tests Fixed

This notebook is an addition to the series of KPI notebook in which we calculate key performance indicators for CI processes. In this notebook, we will calculate the KPI "Percent of failing tests fixed in each run/timestamp." Essentially, we will determine

percent of tests that were failing and are now fixed

For OpenShift managers, this information can potentially help quantify the agility and efficiency of their team. If this number is high, it means they are able to quickly identify the root causes of all failing tests in the previous run and fix them. Conversely if this number is low, it means only a small percent of previously failing tests get fixed in each new run, which in turn implies that their CI process is likely not as efficient as it could be.

Related issues: #149

[1]

import os
import gzip
import json
import datetime

import numpy as np
import pandas as pd

from ipynb.fs.defs.metric_template import decode_run_length
from ipynb.fs.defs.metric_template import testgrid_labelwise_encoding
from ipynb.fs.defs.metric_template import CephCommunication
from ipynb.fs.defs.metric_template import save_to_disk, read_from_disk

from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

[2]

## Specify variables

METRIC_NAME = "pct_fixed_each_ts"

# Specify the path for input grid data,
INPUT_DATA_PATH = "../../../../data/raw/testgrid_183.json.gz"

# Specify the path for output metric data
OUTPUT_DATA_PATH = f"../../../../data/processed/metrics/{METRIC_NAME}"

## CEPH Bucket variables
## Create a .env file on your local with the correct configs,
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")
s3_path = os.getenv("S3_PROJECT_KEY", "ai4ci/testgrid/metrics")
s3_input_data_path = "raw_data"
AUTOMATION = os.getenv("IN_AUTOMATION")

[3]

## Import data
timestamp = datetime.datetime.today()

if AUTOMATION:
    filename = f"testgrid_{timestamp.day}{timestamp.month}.json"
    cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
    s3_object = cc.s3_resource.Object(s3_bucket, f"{s3_input_data_path}/{filename}")
    file_content = s3_object.get()["Body"].read().decode("utf-8")
    testgrid_data = json.loads(file_content)

else:
    with gzip.open(INPUT_DATA_PATH, "rb") as read_file:
        testgrid_data = json.load(read_file)

Calculation

To find fixed tests, we modified the testgrid_labelwise_encoding function. The loop is adapted to put a "True" if a test was fixed in the current run, and "False" otherwise. Basically instead of indicating "is_flake" or "is_pass," it indicates "is passing now but was failing before" aka "is_flip."

[4]

# NOTE: this for loop is a modified version of the testgrid_labelwise_encoding function

percent_label_by_grid_csv = []

for tab in testgrid_data.keys():
    print(tab)

    for grid in testgrid_data[tab].keys():
        current_grid = testgrid_data[tab][grid]

        # get all timestamps for this grid (x-axis of grid)
        timestamps = [
            datetime.datetime.fromtimestamp(t // 1000)
            for t in current_grid["timestamps"]
        ]

        tests = []
        all_tests_did_get_fixed = []

        # NOTE: this list of dicts goes from most recent to least recent
        for i, current_test in enumerate(current_grid["grid"]):
            tests.append(current_test["name"])
            statuses_decoded = decode_run_length(current_grid["grid"][i]["statuses"])

            did_get_fixed = []
            for status_i in range(0, len(statuses_decoded) - 1):
                did_get_fixed.append(
                    statuses_decoded[status_i] == 1
                    and statuses_decoded[status_i + 1] == 12
                )

            # the least recent test cannot have "True", assuming it wasnt failing before
            did_get_fixed.append(False)

            # add results for all timestamps for current test
            all_tests_did_get_fixed.append(np.array(did_get_fixed))

        all_tests_did_get_fixed = [
            list(zip(timestamps, g)) for g in all_tests_did_get_fixed
        ]

        # add the test, tab and grid name to each entry
        # TODO: any ideas for avoiding this quad-loop
        for i, d in enumerate(all_tests_did_get_fixed):
            for j, k in enumerate(d):
                all_tests_did_get_fixed[i][j] = (k[0], tab, grid, tests[i], k[1])

        # accumulate the results
        percent_label_by_grid_csv.append(all_tests_did_get_fixed)

# output above leaves us with a doubly nested list. Flatten
flat_list = [item for sublist in percent_label_by_grid_csv for item in sublist]
flatter_list = [item for sublist in flat_list for item in sublist]

"redhat-assisted-installer"
"redhat-openshift-informing"
"redhat-openshift-ocp-release-4.1-blocking"
"redhat-openshift-ocp-release-4.1-informing"
"redhat-openshift-ocp-release-4.2-blocking"
"redhat-openshift-ocp-release-4.2-informing"
"redhat-openshift-ocp-release-4.3-blocking"
"redhat-openshift-ocp-release-4.3-broken"
"redhat-openshift-ocp-release-4.3-informing"
"redhat-openshift-ocp-release-4.4-blocking"
"redhat-openshift-ocp-release-4.4-broken"
"redhat-openshift-ocp-release-4.4-informing"
"redhat-openshift-ocp-release-4.5-blocking"
"redhat-openshift-ocp-release-4.5-broken"
"redhat-openshift-ocp-release-4.5-informing"
"redhat-openshift-ocp-release-4.6-blocking"
"redhat-openshift-ocp-release-4.6-broken"
"redhat-openshift-ocp-release-4.6-informing"
"redhat-openshift-ocp-release-4.7-blocking"
"redhat-openshift-ocp-release-4.7-broken"
"redhat-openshift-ocp-release-4.7-informing"
"redhat-openshift-ocp-release-4.8-blocking"
"redhat-openshift-ocp-release-4.8-informing"
"redhat-openshift-ocp-release-4.9-blocking"
"redhat-openshift-ocp-release-4.9-informing"
"redhat-openshift-okd-release-4.3-informing"
"redhat-openshift-okd-release-4.4-informing"
"redhat-openshift-okd-release-4.5-blocking"
"redhat-openshift-okd-release-4.5-informing"
"redhat-openshift-okd-release-4.6-blocking"
"redhat-openshift-okd-release-4.6-informing"
"redhat-openshift-okd-release-4.7-blocking"
"redhat-openshift-okd-release-4.7-informing"
"redhat-openshift-okd-release-4.8-blocking"
"redhat-openshift-okd-release-4.8-informing"
"redhat-openshift-okd-release-4.9-informing"
"redhat-openshift-presubmit-master-gcp"
"redhat-osd"
"redhat-single-node"

[5]

flatter_list[0]

(datetime.datetime(2021, 3, 15, 23, 40, 20),
 '"redhat-assisted-installer"',
 'periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted',
 'Overall',
 False)

[6]

# this df indicates whether a test was fixed or not at a given timestamp (as compared to previous one)
df_csv = pd.DataFrame(
    flatter_list, columns=["timestamp", "tab", "grid", "test", "did_get_fixed"]
)
df_csv.head()

	timestamp	tab	grid	test	did_get_fixed
0	2021-03-15 23:40:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	False
1	2021-03-15 00:01:06	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	False
2	2021-03-13 20:51:32	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	False
3	2021-03-13 07:51:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	False
4	2021-03-13 06:43:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	False

[7]

# each element in this multiindexed series tells how many tests got fixed at each run/timestamp
num_fixed_per_ts = df_csv.groupby(["tab", "grid", "timestamp"]).did_get_fixed.sum()
num_fixed_per_ts

tab                          grid                                                                             timestamp          
"redhat-assisted-installer"  periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted              2021-03-04 01:01:58    0
                                                                                                              2021-03-04 04:21:57    0
                                                                                                              2021-03-04 07:22:22    0
                                                                                                              2021-03-04 08:47:55    0
                                                                                                              2021-03-04 23:12:31    0
                                                                                                                                    ..
"redhat-single-node"         periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso  2021-03-14 00:01:00    0
                                                                                                              2021-03-15 00:01:09    0
                                                                                                              2021-03-16 00:01:15    0
                                                                                                              2021-03-17 00:00:40    0
                                                                                                              2021-03-18 00:01:25    0
Name: did_get_fixed, Length: 37761, dtype: int64

[8]

build_failures_list = testgrid_labelwise_encoding(testgrid_data, 12)

[9]

# this df indicates whether a test was failing or not at a given timestamp
failures_df = pd.DataFrame(
    build_failures_list,
    columns=["timestamp", "tab", "grid", "test", "test_duration", "failure"],
)
failures_df.head()

	timestamp	tab	grid	test	test_duration	failure
0	2021-03-15 23:40:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	80.283333	False
1	2021-03-15 00:01:06	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	92.050000	False
2	2021-03-13 20:51:32	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	80.983333	False
3	2021-03-13 07:51:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	117.716667	False
4	2021-03-13 06:43:20	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	Overall	108.633333	False

[10]

# each element in this multiindexed series tells how many tests failed at each run/timestamp
num_failures_per_ts = failures_df.groupby(["tab", "grid", "timestamp"]).failure.sum()
num_failures_per_ts

tab                          grid                                                                             timestamp          
"redhat-assisted-installer"  periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted              2021-03-04 01:01:58    0
                                                                                                              2021-03-04 04:21:57    0
                                                                                                              2021-03-04 07:22:22    0
                                                                                                              2021-03-04 08:47:55    0
                                                                                                              2021-03-04 23:12:31    0
                                                                                                                                    ..
"redhat-single-node"         periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso  2021-03-14 00:01:00    0
                                                                                                              2021-03-15 00:01:09    0
                                                                                                              2021-03-16 00:01:15    0
                                                                                                              2021-03-17 00:00:40    0
                                                                                                              2021-03-18 00:01:25    0
Name: failure, Length: 37761, dtype: int64

[11]

# dividing the above two df's tells us what percent of failing tests got fixed at each timestamp
pct_fixed_per_ts = (num_fixed_per_ts / num_failures_per_ts.shift()).fillna(0)
pct_fixed_per_ts

tab                          grid                                                                             timestamp          
"redhat-assisted-installer"  periodic-ci-openshift-release-master-nightly-4.6-e2e-metal-assisted              2021-03-04 01:01:58    0.0
                                                                                                              2021-03-04 04:21:57    0.0
                                                                                                              2021-03-04 07:22:22    0.0
                                                                                                              2021-03-04 08:47:55    0.0
                                                                                                              2021-03-04 23:12:31    0.0
                                                                                                                                    ... 
"redhat-single-node"         periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso  2021-03-14 00:01:00    0.0
                                                                                                              2021-03-15 00:01:09    0.0
                                                                                                              2021-03-16 00:01:15    0.0
                                                                                                              2021-03-17 00:00:40    0.0
                                                                                                              2021-03-18 00:01:25    0.0
Length: 37761, dtype: float64

[12]

# convert to df from multiindex series
pct_fixed_per_ts_df = pct_fixed_per_ts.reset_index().rename(columns={0: "pct_fixed"})
pct_fixed_per_ts_df

	tab	grid	timestamp	pct_fixed
0	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 01:01:58	0.0
1	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 04:21:57	0.0
2	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 07:22:22	0.0
3	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 08:47:55	0.0
4	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 23:12:31	0.0
...	...	...	...	...
37756	"redhat-single-node"	periodic-ci-openshift-release-master-nightly-4...	2021-03-14 00:01:00	0.0
37757	"redhat-single-node"	periodic-ci-openshift-release-master-nightly-4...	2021-03-15 00:01:09	0.0
37758	"redhat-single-node"	periodic-ci-openshift-release-master-nightly-4...	2021-03-16 00:01:15	0.0
37759	"redhat-single-node"	periodic-ci-openshift-release-master-nightly-4...	2021-03-17 00:00:40	0.0
37760	"redhat-single-node"	periodic-ci-openshift-release-master-nightly-4...	2021-03-18 00:01:25	0.0

37761 rows × 4 columns

Save to Ceph or local

Save the data frame in a parquet format on the Ceph bucket or locally

[13]

save = pct_fixed_per_ts_df

if AUTOMATION:
    cc = CephCommunication(s3_endpoint_url, s3_access_key, s3_secret_key, s3_bucket)
    cc.upload_to_ceph(
        save,
        s3_path,
        f"{METRIC_NAME}/{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
    )
else:
    save_to_disk(
        save,
        OUTPUT_DATA_PATH,
        f"{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
    )

[14]

## Sanity check to see if the dataset is the same
if AUTOMATION:
    sanity_check = cc.read_from_ceph(
        s3_path,
        f"{METRIC_NAME}/{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
    ).head()
else:
    sanity_check = read_from_disk(
        OUTPUT_DATA_PATH,
        f"{METRIC_NAME}-{timestamp.year}-{timestamp.month}-{timestamp.day}.parquet",
    ).head()

sanity_check

	tab	grid	timestamp
0	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 01:01:58
1	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 04:21:57
2	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 07:22:22
3	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 08:47:55
4	"redhat-assisted-installer"	periodic-ci-openshift-release-master-nightly-4...	2021-03-04 23:12:31

Conclusion

This notebook computed the mean fail length, the mean time to fix failures, pass-to-fail rate, and fail-to-pass rate for tests. The dataframe saved on ceph can be used to generate aggregated views and visualizations on the percent of fixed tests at each timestamp.

Contribute to this page