Making and Evaluating Point Forecasts

Tilmann Gneiting
University of Heidelberg

Single-valued point forecasts continue to be issued and used in almost
all realms of science and society.  Typically, competing point
forecasters or forecasting procedures are compared and assessed by
means of an error measure or scoring function, such as the absolute
error or the squared error, that depends both on the point forecast
and the realizing observation.  The individual scores are then
averaged over forecast cases, to result in a summary measure of the
predictive performance, such as the mean absolute error or the (root)
mean squared error.  I demonstrate that this common practice can lead
to grossly misguided inferences, unless the scoring function and the
forecasting task are carefully matched.

Effective point forecasting requires that the scoring function be
specified a priori, or that the forecaster receives a directive in the
form of a statistical functional, such as the mean or a quantile of
the predictive distribution.  If the scoring function is specified a
priori, the forecaster can issue an optimal point forecast, namely,
the Bayes rule, which minimizes the expected loss under the
forecaster's predictive distribution.  If the forecaster receives a
directive in the form of a functional, it is critical that the scoring
function be consistent for it, in the sense that the expected score is
minimized when following the directive.  Any consistent scoring
function induces a proper scoring rule for probabilistic forecasts,
and a duality principle links Bayes rules and consistent scoring
functions.

A functional is elicitable if there exists a scoring function that is
strictly consistent for it.  Expectations, ratios of expectations and
quantiles are elicitable.  For example, a scoring function is
consistent for the mean functional if and only if it is a Bregman
function.  It is consistent for a quantile if and only if it is
generalized piecewise linear.  Similar characterizations apply to
ratios of expectations and to expectiles.  Weighted scoring functions
are consistent for functionals that adapt to the weighting in peculiar
ways.  Not all functionals are elicitable; for instance, conditional
value-at-risk is not, despite its popularity in quantitative finance.