# Iterative Optimization Heuristics Profiler

The **performance analyzer** for **I**terative **O**ptimization **H**euristics (IOHs).
It *provides*:

- a web-based interface to analyze and visualize the empirical performance of IOHs
- interactive plot
- statistical evaluation
- report generation
`R`

programming interfaces in the backend

## Development Team

- Hao Wang,
*Leiden Institute of Advanced Computer Science*. - Diederick Vermetten,
*Leiden Institute of Advanced Computer Science*. - Furong Ye,
*Leiden Institute of Advanced Computer Science*. - Carola Doerr,
*CNRS and Sorbonne University*. - Ofer Shir
*Migal - The Galilee Research Institute, Tel-Hai College*. - Thomas BĂ¤ck,
*Leiden Institute of Advanced Computer Science*.

## Installation

### Software dependency

- [mandatory]
`R`

As**IOHanalyzer**is written as a`R`

package, the`R`

environment has to be installed first. The binary file and installation manual for R can be found here https://cran.r-project.org/. - [optional]
`orca`

required to download plotly figures. Please see https://github.com/plotly/orca for the installation instruction. - [optional]
`inkscape`

required to support pdf and eps figure format in downloading. Please visit the Inskscape Wiki page http://wiki.inkscape.org/wiki/index.php/Installing_Inkscape for the detail.

## Host it online yourself?

We provide docker file for deploying **IOHanalyzer** on the server. Please see https://github.com/IOHprofiler/IOHanalyzer-docker for details.

## Reference

When using IOHprofiler and parts thereof, please kindly cite this work as

Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, Thomas BĂ¤ck: *IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics*, arXiv e-prints:1810.05281, 2018.

```
@ARTICLE{IOHprofiler,
author = {Carola Doerr and Hao Wang and Furong Ye and Sander van Rijn and Thomas B{\"a}ck},
title = {{IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics}},
journal = {arXiv e-prints:1810.05281},
archivePrefix = "arXiv",
eprint = {1810.05281},
year = 2018,
month = oct,
keywords = {Computer Science - Neural and Evolutionary Computing},
url = {https://arxiv.org/abs/1810.05281}
}
```

## Acknowledgements

This work was supported by the Chinese scholarship council (CSC No. 201706310143), a public grant as part of the Investissement dâ€™avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in a joint call with the Gaspard Monge Program for optimization, operations research, and their interactions with data sciences, by Paris Ile-de-France Region, and by COST Action CA15140 â€śImproving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO)â€ť.

## License

This application is governed by the **BSD 3-Clause license**.

BSD 3-Clause License

Copyright © 2018, All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# Supported Data Format

The user is encouraged to analyze your own benchmark data using **IOHanalyzer**. For this purpose, users have prepare their own data set according to the supported data formats. At the time of writing, three data formats are supported by **IOHanalyzer**:

**COCO**data format as regulated in [1].**IOHprofiler**data format, which is motivated and modified from**COCO**data format.- A
**two-column**format that is simplified from**IOHprofiler**format.

When loading the data in programming interface (and in the graphical user interface as well), it is not necessary to specified which format the user took as **IOHanalyzer** detects the format automatically. For all three data formats, data files are organized in the same manner within the file system. The structure of data files exemplified as follows:

Generally, in the folder (e.g., `./`

here) that contains the data set, the following files are mandatory for **IOHanalyzer**:

*Meta-data*(e.g.,`IOHprofiler_f1_i1.info`

) summarizes the algorithmic performance for each problem instance, with naming the following naming convention:`IOHprofiler_f[:num:]_i[:num:].info`

for problem \(f1\) and instance \(1\). Note that one meta-data file can consist of several dimensions. Please see the detail below.*Raw-data*(e.g.,`IOHprofiler_f1_DIM100_i1.dat`

) are CSV-based files that contain performance information indexed by the running time. Raw-data files are named in the similar manner as with the meta-data, for example,`IOHprofiler_f1_DIM100_i1.dat`

for problem \(f1\), dimension \(100\) and instance \(1\). By default, the data files are organized in the group of test functions and stored in folders with naming convention:`data_[function ID]/`

. It is important to note that those three data formats only differ in structure of the raw-data files.

## Meta-data

When benchmarking, it is common to specify a number of different dimensions, functions and instances, resulting in a quite large number of data files (e.g., `*.dat`

files). It would make the data organization more structured if some meta data are provided. Here, the meta data are implemented in a format that is very similar to that in the well-known **COCO** environment. The meta data are indicated with suffix \verb|.info|. An small example is provided as follows:

```
suite = 'PBO', funcId = 10, DIM = 100, algId = '(1+1) fGA'
%
data_f10/IOHprofiler_f10_DIM625.dat, 1:1953125|5.59000e+02,
1:1953125|5.59000e+02, 1:1953125|5.59000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.64000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.49000e+02, 1:1953125|5.54000e+02,
1:1953125|5.49000e+02
suite = 'PBO', funcId = 10, DIM = 625, algId = '(1+1) fGA'
%
data_f10/IOHprofiler_f10_DIM625.dat, 1:1953125|5.59000e+02,
1:1953125|5.59000e+02, 1:1953125|5.59000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.64000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.49000e+02, 1:1953125|5.54000e+02,
1:1953125|5.49000e+02
...
```

Note that, as this meta information is also used in **IOHanalyzer** when loading the data, it is **crucial** to give an attention to the format of the meta data, if the user would convert their own data sets.

**The first line**stores some meta-information of the experiment as (name, value) pairs. Note that, such pairs are separated by commas and three names,`funcId`

,`DIM`

and`algId`

are*case-sensitive*and*mandatory*.**The second line**always starts with a single`%`

, indicating what follows this symbol should be the general comments from the user on this experiment. By default, it is left empty.**The third line**starts with the relative path to the actual data file, followed by the meta-information obtained on each instance, with the following format: \[\underbrace{1}_{\text{instance ID}}:\underbrace{1953125}_{\text{running time}}|\;\underbrace{5.59000e+02}_{\text{best-so-far f(x)}}\]

## Raw-data

Despite different events are used for those four types of data file, those files take the same format, which is adapted from CSV format to accommodate multiple runs/instances. Typically, this format is illustrated by the example below (with dummy data records):

```
"function evaluation" "current f(x)" "best-so-far f(x)" "current af(x)+b" "best af(x)+b" "parameter name" ...
1 +2.95000e+02 +2.95000e+02 +2.95000e+02 +2.95000e+02 0.000000 ...
2 +2.96000e+02 +2.96000e+02 +2.96000e+02 +2.96000e+02 0.001600 ...
4 +3.07000e+02 +3.07000e+02 +3.07000e+02 +3.07000e+02 0.219200 ...
9 +3.11000e+02 +3.11000e+02 +3.11000e+02 +3.11000e+02 0.006400 ...
12 +3.12000e+02 +3.12000e+02 +3.12000e+02 +3.12000e+02 0.001600 ...
16 +3.16000e+02 +3.16000e+02 +3.16000e+02 +3.16000e+02 0.006400 ...
20 +3.17000e+02 +3.17000e+02 +3.17000e+02 +3.17000e+02 0.001600 ...
23 +3.28000e+02 +3.28000e+02 +3.28000e+02 +3.28000e+02 0.027200 ...
27 +3.39000e+02 +3.39000e+02 +3.39000e+02 +3.39000e+02 0.059200 ...
"function evaluation" "current f(x)" "best-so-far f(x)" "current af(x)+b" "best af(x)+b" "parameter name" ...
1 +3.20000e+02 +3.20000e+02 +3.20000e+02 +3.20000e+02 1.000000 ...
24 +3.44000e+02 +3.44000e+02 +3.44000e+02 +3.44000e+02 2.000000 ...
60 +3.64000e+02 +3.64000e+02 +3.64000e+02 +3.64000e+02 3.000000 ...
"function evaluation" "current f(x)" "best-so-far f(x)" "current af(x)+b" "best af(x)+b" "parameter name" ...
... ... ... ... ... ... ...
```

Note that,

**[mandatory]**each*separation line*(line that starts with`"function evaluation"`

) serves as a separator among different independent runs of the same algorithm. Therefore, it is clear that the data block between two separation lines corresponds to a single run a triplet of (dimension, function, instance).**[mandatory]**`"function evaluation"`

the current number of function evaluations.**[mandatory]**`"best-so-far f(x)"`

keeps track of the best function value observed since the beginning of one run.**[optional]**`"current f(x)"`

stands for the function value observed when the corresponding number of function evaluation is consumed.**[optional]**The value stored under`"current af(x)+b"`

and`"best af(x)+b"`

, are so-called*transformed*function values, obtained on each function instances that are generated by translating the orginal function in its domain and co-domain.**[optional]**In addition, a parameter value (named`"parameter"`

) is also tracked in this example and recording more parameter value is also facilitated (see below).

## Two-Column Format

The raw data file, in its simplest **two-column** format should resemble the following example:

```
"function evaluation" "best-so-far f(x)"
1 +2.95000e+02
2 +2.96000e+02
4 +3.07000e+02
23 +3.28000e+02
27 +3.39000e+02
"function evaluation" "best-so-far f(x)"
1 +3.20000e+02
... ...
```

This format is regulated as follows:

- The
*double quotation*(`"`

) in the separation line shall always be kept and it cannot be replace with single quotation (`'`

). - The numbers in the record can either be written in the plain or scientific notation.
- To separate the column,
*a single space or tab*can be used (only one of them should be used consistently in a single data file). - In the target-based tracking file (
`*.dat`

) each block of records (as divided by the separation line)**must**end with the last function evaluation. - Each data line should contain a complete record. Incomplete data lines will be dropped when loading the data into
**IOHanalyzer**. - In case the quotation mark is needed in the parameter name, please use the single quotation (
`'`

).

## Reference

[1] Hansen N, Auger A, Finck S, Ros R (2009a). “Real-Parameter Black-Box OptimizationBenchmarking 2009: Experimental Setup.” Research Report RR-6828, INRIA. URL https://hal.inria.fr/inria-00362649.

Welcome to IOHanalyzer!

This the **analyzer** for the empirical performance of *Iterative Optimization Heuristics* (IOHs). It provides *two perspectives* to look at the empirical performance:

- Fixed-Target running time: focuses on the distribution of
**running time**(a.k.a. the number of function evaluations) at various given target values (thus called “fixed-target”), functions and dimensions. - Fixed-Budget results: focuses on the distribution of the
**best function value**at various given budget values (thus called “fixed-budget”), functions and dimensions.

To get started, you could

**upload**your own dataset using the*Upload Data*box on the bottom left. The supported format is specified in the data format tab.- or
**choose**one of the pre-loaded datasets in the*Load Data from Repository*box on the bottom right and directly explore the analyzer.

In details, the following functionalities are provided:

Data Summary | Expected Values | Probability Density/Mass Function | Empirical Cumulative Distribution (ECDF) | |
---|---|---|---|---|

Fixed-Target running time | Summary of running times | Expected Running Time (ERT) | Probability Mass Function of running time | ECDF of running times |

Fixed-Budget results | Summary of function values | Expected Target Value | Probability Density Function of target values | ECDF of targets times |

For more information on the features of IOHanalyzer and how to use them, as well as a full specification of the accepted data formats, please visit our wiki.

Upload Data

Load Data from Repository

Data Processing Prompt

List of Processed Data

Overview of All Available Functions

Overview of Selected Function

Data Overview

This table provides an overview on function values for all algorithms chosen on the left:

- worst recorded: the worst \(f(x)\) value ever recorded across
*all iterations*, - worst reached: the worst \(f(x)\) value reached in the
*last iterations*, - best reached: the best \(f(x)\) value reached in the
*last iterations*, - mean reached: the mean \(f(x)\) value reached in the
*last iterations*, - median reached: the median \(f(x)\) value reached in the
*last iterations*, - succ: the number of runs which successfully hit the best reached \(f(x)\).

Runtime Statistics at Chosen Target Values

This table summarizes for each algorithm and each target value chosen on the left:

- runs: the number of runs that have found at least one solution of the required target quality \(f(x)\),
- mean: the average number of function evaluations needed to find a solution of function value at least \(f(x)\)
- median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these first-hitting times

When not all runs managed to find the target value, the statistics hold only for those runs that did. That is, the mean value is the mean of the successful runs. Same for the quantiles. An alternative version with simulated restarts is currently in preparation.

Original Runtime Samples

This table shows for each selected algorithm \(A\), each selected target value \(f(x)\), and each run \(r\) the number \(T(A,f(x),r)\) of evaluations performed by the algorithm until it evaluated for the first time a solution of quality at least \(f(x)\).

Expected Runtime (ERT): single function

The ** mean, median, standard deviation and ERT** of the runtime samples
are depicted against the best objective values.
The displayed elements (mean, median, standard deviations and ERT)
can be switched on and off by clicking on the legend on the right.
A

**tooltip**and

**toolbar**appears when hovering over the figure.

Expected Runtime Comparisons (across dimensions)

The ** ERT** of the runtime samples across all functions.
ERT is decided based on the target values in the table below,
with the default being the

**best reached f(x) by any of the selected algorithms**.

*Infinite ERTS*are shown as seperate dots on the graph.

The chosen **target values** per dimension are as follows
(double click an entry to edit it):

The raw **ERT**-values are:

Expected Runtime (ERT): all functions

The ** ERT** is shown against the target
values for all functions in the selected dimension.

Expected Runtime Comparisons (across functions on one dimension)

The ** ERT** of the runtime samples across all functions.
ERT is decided based on the target values in the table below,
with the default being the

**best reached f(x) by any of the selected algorithms**.

*Infinite ERTS*are shown as seperate dots on the graph.

The chosen **target values** per function are as follows
(double click an entry to edit it):

The raw **ERT**-values are:

### Histogram of Fixed-Target Runtimes

This histogram counts how many runs needed between
\(t\) and \(t+1\) function evaluations. The bins
\([t,t+1)\) are chosen automatically. The bin size is determined
by the so-called **Freedmanâ€“Diaconis rule**: \(\text{Bin size}=
2\frac{Q_3 - Q_1}{\sqrt[3]{n}}\), where \(Q_1, Q_3\) are the \(25\%\)
and \(75\%\) percentile of the runtime and \(n\) is the sample size.
The displayed algorithms can be selected by clicking on the legend on the right.
A **tooltip** and **toolbar** appears when hovering over the figure.

### Empirical Probability Mass Function of the Runtime

**Warning! **The
**probability mass function** of the runtime is approximated by the
treating the runtime as a *continuous* random variable and
applying the **kernel estimation** (KDE):

The plot shows the distribution of the first hitting
times of the individual runs (dots), and an estimated
distribution of the probability mass function.
The displayed algorithms can be selected by clicking on
the legend on the right. A **tooltip** and **toolbar**
appear when hovering over the figure. This also includes the
option to download the plot as png file. A csv file with the runtime
data can be downlowaded from the
Data Summary tab.

Empirical Cumulative Distribution: Single target

Each EDCF curve shows the proportion of the runs
that have found a solution of at least the required
target value within the budget given by the \(x\)-axis.
The displayed curves can be selected by clicking on the legend on the right. A **tooltip**
and **toolbar** appears when hovering over the figure.
This also includes the option to download the plot as png file.

Aggregated Empirical Cumulative Distribution: Single function

The evenly spaced target values are:

The fraction of (run,target value)
pairs \((i,v)\) satisfying that the best solution that the algorithm has
found in the \(i\)-th run within the given time budget \(t\) has quality at least
\(v\) is plotted against the available budget \(t\). The displayed elements can be switched
on and off by clicking on the legend on the right. A **tooltip**
and **toolbar** appears when hovering over the figure.

Area Under the ECDF

The **area under the ECDF** is
caculated for the sequence of target values specified on the left. The displayed
values are normalized against the maximal number of function evaluations for
each algorithm. Intuitively, the larger the area, the better the algorithm.
The displayed algorithms can be selected by clicking on the legend on the right.
A **tooltip** and **toolbar** appears when hovering over the figure.
This also includes the option to download the plot as png file.

Aggregated Empirical Cumulative Distribution: All functions

The fraction of (run,target value, ...)
pairs \((i,v, ...)\) satisfying that the best solution that the algorithm has
found in the \(i\)-th (run of function \(f\) in dimension \(d\)) within
the given time budget \(t\) has quality at least \(v\) is plotted against
the available budget \(t\). The displayed elements can be switched
on and off by clicking on the legend on the right. A **tooltip**
and **toolbar** appears when hovering over the figure. Aggregation over
functions and dimension can be switched on or off using the checkboxes on
the left; when aggregation is off the selected function / dimension
is chosen according the the value in the bottom-left selection-box.

The selected targets are:

Parameter Statistics at Chosen Target Values

This table summarizes for each algorithm and each target value chosen on the left:

- runs: the number of runs where non-missing parameter values are observed, for each required target value \(f(x)\),
- mean: the average value of the specified
**parameter**when target value \(f(x)\) is hit. - median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these parameter values

When not all runs managed to find the target value, the statistics hold only for those runs that did. That is, the mean value is the mean of the successful runs. Same for the quantiles. An alternative version with simulated restarts is currently in preparation.

Expected Parameter Value (per function)

The ** mean or median** of internal parameters of the algorithm
found with a fixed-budget of evaluations are depicted against the budget.
The displayed elements can be switched on and off by clicking on the legend on the right.
A

**tooltip**and

**toolbar**appears when hovering over the figure.

Parameter Sample at Chosen Target Values

This table shows for each selected algorithm \(A\), each selected target value \(f(x)\), and each run \(r\) the parameter value observed when the target value \(f(x)\) is reached for the first time.

Hypothesis Testing

The **Kolmogorov-Smirnov test** is performed on empirical CDFs of running times for each pair of
algorithms, in order to determine which algorithm gives a significantly
smaller running time distribution. The resulting p-values are arranged in a matrix, where
each cell \((i, j)\) contains a p-value from the test with the alternative hypothesis:
the running time of algorithm \(i\) is smaller (thus better) than that of \(j\).

Decisions of the test, based on the \(p-\) value matrix and the \(\alpha\) value,
are visualized in a heatmap (**left**) and a network (**right**).
In each cell (row, column) of the heatmap, the alternative hypothesis is again:
the row algorithm has smaller (better) running time than the column.
The color indicates:

- Red: A row is better than the column
- Blue: A row is worse than the column
- Gray: no significant distinction between the row and column

Glicko2-based ranking

The **Glicko2** This procedure ranks algorithms based on a glico2-procedure.
Every round, for every function and dimension of the datasetlist,
each pair of algorithms competes. This competition samples a random runtime for the
provided target (best achieved target among all algorithms). Whichever algorithm has the lower
runtime wins the game. Then, from these games, the glico2-rating is used to determine the ranking.

The chosen **target function values** per (function, dimension)-pair are as follows
(double click an entry to edit it):

The results of the ranking are:

Data Overview

Target Statistics at Chosen Budget Values

This table summarizes for each algorithm and each **budget** \(B\) chosen on the left:

- runs: the number of runs that have performed at least \(B\) evaluations,
- mean: the average best-so-far function value obtain within a budget of \(B\) evaluations,
- median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of the best function values found within the first \(B\) evaluations.

When not all runs evaluated at least \(B\) search points, the statistics hold for the subset of runs that did. Alternative statistics using simulated restarted algorithms are in preparation.

Original Target Samples

Expected Target Value (per function)

The ** mean, median and standard deviation** of the best function values
found with a fixed-budget of evaluations are depicted against the budget.
The displayed elements can be switched on and off by clicking on the legend on the right.
A

**tooltip**and

**toolbar**appears when hovering over the figure.

Expected Function values: all functions

Expected Function Value comparisons

The chosen budget values per function are as follows (double click an entry to edit it):

The raw function-values are:

### Histogram of Fixed-Budget Targets

This histogram counts the number of runs whose best-so-far function
values within the first \(B\) evaluations is between \(v_i\) and
\(v_{i+1}\). The buckets \([v_i,v_{i+1})\) are chosen automatically
according to the so-called **Freedmanâ€“Diaconis rule**: \(\text{Bin size}=
2\frac{Q_3 - Q_1}{\sqrt[3]{n}}\), where \(Q_1, Q_3\) are the \(25\%\)
and \(75\%\) percentile of the runtime and \(n\) is the sample size.
The displayed algorithms can be selected by clicking on the legend on the right.
A **tooltip** and **toolbar** appears when hovering over the figure.

### Empirical Probability Density Function of Fixed-Budget Function Values

The plot shows, for the budget selected on the left, the distribution
of the best-so-far function values of the individual runs (dots), and an estimated distribution of the probability mass function.
The displayed algorithms can be selected by clicking on the legend on the right. A **tooltip** and **toolbar**
appear when hovering over the figure. A csv file with the runtime data can be downloaded from the
Data Summary tab.

Empirical Cumulative Distribution of the Fixed-Budget Values: Single Budgets

Each EDCF curve shows the proportion of the runs that have found
within the given budget B a solution of at least the required target
value given by the x-axis. The displayed curves can be selected
by clicking on the legend on the right. A **tooltip** and **toolbar**
appears when hovering over the figure.

Empirical Cumulative Distribution of the Fixed-Budget Values: Aggregation

The evenly spaced budget values are:

The fraction of (run,budget) pairs \((i,B)\) satisfying that the best
solution that the algorithm has found in the \(i\)-th run within the
first \(B\) evaluations has quality at **most** \(v\) is plotted
against the target value \(v\). The displayed elements can be switched
on and off by clicking on the legend on the right. A **tooltip** and
**toolbar** appears when hovering over the figure.

Area Under the ECDF

The **area under the ECDF** is
caculated for the sequence of budget values specified on the left. The displayed
values are normalized against the maximal target value recorded for
each algorithm. Intuitively, the **smaller** the area, the **better** the algorithm.
The displayed algorithms can be selected by clicking on the legend on the right.
A **tooltip** and **toolbar** appears when hovering over the figure.

Expected Parameter Value (per function)

The ** mean or median** of internal parameters of the algorithm
found with a fixed-budget of evaluations are depicted against the budget.
The displayed elements can be switched on and off by clicking on the legend on the right.
A

**tooltip**and

**toolbar**appears when hovering over the figure.

Parameter Statistics at Chosen Runtime Values

This table summarizes for each algorithm and each runtime value chosen on the left:

- runs: the number of runs where non-missing parameter values are observed, for each required runtime value,
- mean: the average value of the specified
**parameter**after the specified amount of function evaluations. - median, \(2\%, 5\%,\ldots,98\%\) : the quantiles of these parameter values

When not all runs used the specified runtime value, the statistics hold only for those runs that did. That is, the mean value is the mean of the applicable runs. Same for the quantiles.

Parameter Sample at Chosen Runtime Values

This table shows for each selected algorithm \(A\), each selected runtime value \(b\), and each run \(r\) the parameter value observed when the after a total budget of \(b\) function evaluations has been used.

Hypothesis Testing

The **Kolmogorov-Smirnov test** is performed on empirical CDFs of running times for each pair of
algorithms, in order to determine which algorithm gives a significantly
smaller running time distribution. The resulting p-values are arranged in a matrix, where
each cell \((i, j)\) contains a p-value from the test with the alternative hypothesis:
the running time of algorithm \(i\) is smaller (thus better) than that of \(j\).

Decisions of the test, based on the \(p-\) value matrix and the \(\alpha\) value,
are visualized in a heatmap (**left**) and a network (**right**).
In each cell (row, column) of the heatmap, the alternative hypothesis is again:
the row algorithm has smaller (better) running time than the column.
The color indicates:

- Red: A row is better than the column
- Blue: A row is worse than the column
- Gray: no significant distinction between the row and column

Glicko2-based ranking

The **Glicko2** This procedure ranks algorithms based on a glico2-procedure.
Every round, for every function and dimension of the datasetlist,
each pair of algorithms competes. This competition samples a random function value for the
provided runtime (maximum used among all algorithms). Whichever algorithm has the better
function value wins the game. Then, from these games, the glico2-rating is used to determine the ranking.

The chosen **budget values** per (function, dimension)-pair are as follows
(double click an entry to edit it):

The results of the ranking are:

Color Settings

Example of the current colorscheme.

General settings

Set the figure download properties

Set the figure fontsizes