PSR
Partial-sample regression function to estimate the similarity, informativeness, and relevance of dependent variables.
Last updated
Partial-sample regression function to estimate the similarity, informativeness, and relevance of dependent variables.
Last updated
Estimate results from the partial-sample regression model as described by Czasonis, Kritzman, and Turkington in their 2020 research paper (Journal of Portfolio Management, see reference link below).
One of our principals, Mark Kritzman, introduces this powerful model in a lecture at State Street's research retreat in 2020. View a recording on the lecture below.
The following describes the function signature for use in Microsoft Excel's formula bar.
whichStat
Required. String to specify statistic to return, use one of the following options:
"similarity"
"informativeness"
"relevance"
"scaledrelevance"
"rank"
"filter" = dummy vector to indicate relevant cross-sectional observations
"weighted", "relevanceweighted"
"yhat" = forecast value(s) for the dependent variable.
y
Required. Time series or matrix of dependent variables. This is typically the time series of your portfolios, managers, or asset class returns.
x
Required. Time series or matrix of independent variables. This is typically a set of economic variables or factors.
theta
Vector of predictor values, , to use with the model parameters (coefficients) to forecast the response variable . If this argument is empty, the function will assume the most recent cross-sectional values of the independent variables.
threshold
Optional. Relevance threshold, numerical value to specify the minimum percentage of or relevant periods. If the argument is not specified, it defaults to 0.50 (at least 50% relevant periods will be included in the forecast of the partial-sample regression).
Specify optional pairs of arguments where Name is the option argument name and Value is the corresponding input object. Name-value arguments must appear after other input arguments above, but the order of these pairs does not matter.
Example:
threshold
Threshold value to determine relevance cutoff. If the argument is not specified, it defaults to 0.50 (at least 50% relevant periods will be included in the forecast of the partial-sample regression). See also isPercentile option.
isPercentile
Logical, to indicate whether the threshold value is in percentile units or a level value, default = true.
thresholdDirection
Value to indicate the criteria to evaluate relevance against the threshold value set
The default threshold direction is .
solveMaxFit
Logical (TRUE or FALSE) flag. If true, the regression model will solve for the maximum fit.
selectVariables
Logical (TRUE or FALSE) flag. If true, the regression model will solve for maximum fit with the optimal selection of variables. If false, then the model will use all variables when solving for maximum fit.
covariance
Covariance matrix of the independent variables.
The function's output will vary depending on the specification of the whichStat
argument. The following table will describe the corresponding output result. For M-dependent variables (y) and N-independent variables (x) across T-observations:
yHat, prediction
Forecast of dependent variable(s) from the partial sample regression model.
relevance
Tx1 vector of relevance scores. Relevance is the sum of statistical similarity and informativeness. I.e. Relevance is a measure of the importance of an observation to prediction. Its components are the informativeness of past circumstances, the informativeness of current circumstances, and the similarity of past circumstances to current circumstances.
similarity
Tx1 vector of statistical similarity, measured as the negative of the Mahalanobis distance of the past observations for the independent variables to the current values for the independent variables. Or put simply, past observations that are like the current observations are more relevant.
informativeness
Tx1 vector of informativeness as measured by the Mahalanobis distance of the historical observations of the independent variables from its average values.
infoTheta
Tx1 vector of informativeness as measured by the Mahalanobis distance of the historical observations of the independent variables from the circumstances specified (theta).
weights
Tx1 Vector of partial sample regression weights.
fit
1xM Fit values. Fit is the average alignment between relevance and outcomes across all observation pairs for a single prediction. A large value indicates that the observations that are similarly relevant have similar outcomes, in which case on should have more confidence in the prediction. A small value indicates that relevance does not line up with the outcomes, in which case one should view the prediction more cautiously.
filter, included
Tx1 Dummy vector to indicate sub-sample periods that meet the threshold criteria.