Missing data? Relevance has it covered
We show that relevance-based prediction offers an elegant solution to the problem of incomplete data, preserving valuable information and enhancing prediction reliability in a way that is not possible using traditional models.
May 2025
When setting out to form data-driven predictions, it’s common to encounter incomplete information, such as a time series with shorter history lengths or observations with missing data. Traditional methods for addressing this challenge either discard valuable data or manufacture replacements based on limiting assumptions, leading to unreliable results.
We propose a novel technique called Relevance-Based Prediction (RBP), which elegantly navigates the pitfalls of missing data by retaining more information and accounting for the relative importance of observations for which only partial data is available.
Key highlights
Imagine trying to solve a puzzle with missing pieces. You might think that without all the pieces, the puzzle is unsolvable. But what if you could use the remaining pieces to create a clearer picture despite the gaps? This thought experiment mirrors the challenge of forming data-driven predictions with incomplete information. Traditional methods either discard valuable data or manufacture replacements based on limiting assumptions, leading to unreliable results.
RBP is a model-free routine that forms predictions as weighted averages of observed outcomes, with weights based on a precise measure called relevance. The technique leverages fit, which quantifies the reliability of each prediction, and grid prediction, which combines multiple predictions from different combinations of observations and variables. Taken together, these components act like a chef using the best available ingredients to create a dish, even if some ingredients are missing. This method ensures that the final prediction is robust and reliable, much like a well-balanced recipe.
To do this in practice, RBP assigns zero relevance weights to observations with missing information and blends data across a prediction grid where predictive variables are included and excluded in different combinations. As we illustrate in our paper, this process allows for the retention of more observations, accounts for the relative importance of missing data and provides advance notice of how missing data impacts the reliability of each prediction.
The bottom line is that RBP offers a sophisticated solution to the problem of incomplete information, preserving valuable data and enhancing prediction reliability in a way that is not possible using traditional models. By treating missing information with the care it deserves, RBP transforms the challenge of missing data into an opportunity for more accurate and insightful predictions.