API Docs

Public

Note

See the Algorithms and Parallelism section for the doc strings of particular algorithms.

Only shapley is exported.

Shapley.shapleyFunction
shapley(predict, a, X)
shapley(predict, a, X, j)
shapley(predict, a, X, Z)
shapley(predict, a, X, j, Z)

Compute the Shapley values of the points in a dataset Z, using the dataset X as an empirical estimate of the distribution of the points in Z. If Z is not provided, Shapley values will be computed for the points in X. There are a variety of inequivalent methods for computing Shapley values, and these should be specified with a, a Shapley.Algorithm object which specifies how the computation should be done. The predict function will be used to make predictions on the dataset. The model object being evaluated is encoded via the predict function.

If the prediction returned by predict is deterministic, and therefore an array of numbers, the returned Shapley values will be numerical. If the prediction is probabilistic, it is expected that predict returns an array of Distributions.Sampleable objects and Shapley will return a table. The columns of this table will be named according to the domain of the distributions returned, as determined via Distributions.support.

All algorithms will assume that the function predict is most efficient when operating on batches of data. That is, they will attempt to minimize the number of separate calls to predict.

See the Shapley.jl documentation for more details and examples.

Arguments

  • predict: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions as Distributions.Sampleable objects.
  • a: A Shapley.Algorithm object which specifies how the computation should be performed. See subtypes(Shapley.Algorithm) for a list of available algorithms.
  • X: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.
  • j: The feature to calculate the Shapley values for, as a column of X. Should be an integer or Symbol. If not provided, Shapley values will be computed for all columns.
  • Z: The set of data points to compute the Shapley values of. If not provided, X will be used.

Examples

m = machine(RandomForestRegressor(), X, y)  # an MLJ machine object
ϕ = shapley(ξ -> predict(m, ξ), Shapley.MonteCarlo(512), X)
Shapley.summaryFunction
summary(ϕ, js=colnames(ϕ); statistics=(;))
summary(predict, a, X; statistics=(;), data=X)

Create a summary table describing the statistics of Shapley values for features js from the prediction function predict using X as an empirical estimate of the distribution of the data. By default, this will provide summaries of the mean of the absolute value, the root mean square, and the standard deviation. Users can specify additional statistics via the named tuple statistics.

Note that the table returned is a Vector of NamedTuple. As a Tables compatible object, this can easily by converted to a more convenient form, for example DataFrame(summary(args...)) for a DataFrame.

Currently summary does not work for classifiers because of the different output type. In those cases one should start from shapley.

Arguments

  • ϕ: A table of Shapley values, e.g. as returned from the shapley function.
  • predict: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions as Distributions.Sampleable objects.
  • a: A Shapley.Algorithm object which specifies how the computation should be performed. See subtypes(Shapley.Algorithm) for a list of available algorithms.
  • X: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.
  • js: The feature (or features, as an iterable) to compute Shapley statistics for.
  • data: The dataset to compute Shapley statistics for.
  • statistics: A named tuple of additional statistics. The keys should be the names of the columns in the summary table and the values should be functions which accept a Vector{<:Real}.

All

Shapley.ShapleyModule
Shapley

This module contains methods for computing Shapley values for machine learning methods.

Only the functino shapley is exported.

Shapley.AlgorithmType
Algorithm{P<:AbstractResource}

Abstract type from which algorithms for computing Shapley values are descended. The type parameter is the type of computational resource to be used, but individual algorithms are not required to support all possible resources.

Shapley.:⊟Method
a ⊟ b

A subtraction operation which can optionally take as arguments arrays of Distributions.Sampleable objects. In this case, the returned result will be a table the columns of which are determined by the domain (Distributions.support) of the returned distribtions. This is needed by Shapley algorithms to handle the objects returned by predict functions in both deterministic and probabilistic cases.

For all other cases, this operation is equivalent to a - b.

Shapley._mapMethod
_map(res::AbstractResource)

Get the appropriate map function associated with the resource. For CPU1 this is Base.map whereas for CPUThreads it is ThreadsX.map.

Shapley.coalitionsFunction
coalitions(N, M, ν=(N-1))

Returns a boolean matrix z the rows of which give all possible permutations of binary vectors of length N containing up to ν non-zeros, where rows with a smaller number of non-zero elements come before rows with a larger number of non-zero elements and there are no duplicates. The resulting matrix is repeated M times.

The rows of z are the "coalition" vectors needed by the KernelSHAP method such that they can be used to provide the best possible approximation for a fixed ν.

Shapley.coalitions!Method
coalitions!(r, z, n, k, α)

Write all possible binary vectors to the rows of boolean matrix z containing n non-zeros starting from position k and row α. This can be used recursively to generate all possible permutations of binary vectors such that those with a smaller number of non-zeros occur first without duplicates, see coalitions.

Shapley.colnamesMethod
colnames(X)

Get the column names of either a matrix or a table. If a matrix, uses default Tables.jl column names.

Shapley.compatibletableMethod
compatibletable(X)

Create a Tables.Columns object from X. This function is provided so that tables and arrays can be handled the same way.

Shapley.defaultkernelMethod
defaultkernel(M, zc)

The weighting kernel used by the kernel SHAP method. The resulting weight function grants more importance to coalitions in which fewer features participate.

Shapley.featureindexMethod
featureindex(X, j::Union{Integer,Symbol})

Return the index corresponding to key j in the columns of Tables-compatible X. E.g. if j is an integer this simply returns j.

Shapley.fhMethod
fh(f, rng, z, X, Z)

Call the prediction function f on data points with stochastic replacement from the function h!.

Shapley.fitargsMethod
fitargs(f, m::KernelSHAP, X, Z, z)

Generate the arguments needed by the kernel SHAP method to perform a fit.

Shapley.fitpointMethod
fitpoint(m::KernelSHAP, lhs, rhs)
fitpoint(m::KernelSHAP, lhs, rhs, i)

Perform the fit for the Kernel SHAP methods on the data provided by fitargs. If i is provided, this will be done for the specific data point i.

Shapley.fitresultrowMethod
fitresultrow(m::KernelSHAP, sch::Tables.Schema, lhs, rhs, i)

Return the end result of kernel SHAP for a particular data point.

Shapley.h!Function
h!(ξt, rng, zr, Xt=ξt)

Perform the stochastic replacement of data in accordance with the row of the coalition matrix zr in accordance with the Kernel SHAP method.

Shapley.imputeshuffled!Method
imputeshuffled!(rng, j, ξt, Xt, Zt)

Impute shuffled data into an empty dataset created for the Shapley monte carlo algorithm.

Shapley.isindependentMethod
isindependent(a::Algorithm)

Whether the algorithm can compute Shapley values independently. If the result is true, then it is efficient to call the algorithm for only a particular feature. If false, when calling shapley on a particular feature, the Shapley values for other features will be computed and discarded. It is therefore not recommended to call shapley for subsets of features for algorithms for which isindependent is false.

Shapley.montecarloMethod
montecarlo(predict, m, X, j, Z)

A single iteration of the Shapley monte carlo algorithm.

Shapley.ncolumnsMethod
ncolumns(X)

Get the number of columns of either a matrix or a table.

Shapley.nrowsMethod
nrows(X)

Get the number of rows of either a matrix or a table.

Shapley.ntcopyMethod
ntcopy(x)

Copy an object. This defaults to Base.copy except for on NamedTuples where it copies each column.

Shapley.opnamedtupleFunction
opnamedtuple(op, ϕ, n=Symbol(string(op)))

Create a named tuple the elements of which are the operation op applied to the columns of Tables compatible object ϕ. The names of the columns of the resulting table will be those of the columns of ϕ prepended by n*'_'.

Shapley.shapleyFunction
shapley(predict, a, X)
shapley(predict, a, X, j)
shapley(predict, a, X, Z)
shapley(predict, a, X, j, Z)

Compute the Shapley values of the points in a dataset Z, using the dataset X as an empirical estimate of the distribution of the points in Z. If Z is not provided, Shapley values will be computed for the points in X. There are a variety of inequivalent methods for computing Shapley values, and these should be specified with a, a Shapley.Algorithm object which specifies how the computation should be done. The predict function will be used to make predictions on the dataset. The model object being evaluated is encoded via the predict function.

If the prediction returned by predict is deterministic, and therefore an array of numbers, the returned Shapley values will be numerical. If the prediction is probabilistic, it is expected that predict returns an array of Distributions.Sampleable objects and Shapley will return a table. The columns of this table will be named according to the domain of the distributions returned, as determined via Distributions.support.

All algorithms will assume that the function predict is most efficient when operating on batches of data. That is, they will attempt to minimize the number of separate calls to predict.

See the Shapley.jl documentation for more details and examples.

Arguments

  • predict: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions as Distributions.Sampleable objects.
  • a: A Shapley.Algorithm object which specifies how the computation should be performed. See subtypes(Shapley.Algorithm) for a list of available algorithms.
  • X: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.
  • j: The feature to calculate the Shapley values for, as a column of X. Should be an integer or Symbol. If not provided, Shapley values will be computed for all columns.
  • Z: The set of data points to compute the Shapley values of. If not provided, X will be used.

Examples

m = machine(RandomForestRegressor(), X, y)  # an MLJ machine object
ϕ = shapley(ξ -> predict(m, ξ), Shapley.MonteCarlo(512), X)
Shapley.shuffledataMethod
shuffledata(rng, X, j, Z)

Return shuffled instances of Z as needed by the Shapley monte carlo algorithm.

Shapley.similartableMethod
similartable(tab)

Creates a named tuple the values of which are obtained by calling similar on the corresponding columns of the table tab.

Shapley.summaryFunction
summary(ϕ, js=colnames(ϕ); statistics=(;))
summary(predict, a, X; statistics=(;), data=X)

Create a summary table describing the statistics of Shapley values for features js from the prediction function predict using X as an empirical estimate of the distribution of the data. By default, this will provide summaries of the mean of the absolute value, the root mean square, and the standard deviation. Users can specify additional statistics via the named tuple statistics.

Note that the table returned is a Vector of NamedTuple. As a Tables compatible object, this can easily by converted to a more convenient form, for example DataFrame(summary(args...)) for a DataFrame.

Currently summary does not work for classifiers because of the different output type. In those cases one should start from shapley.

Arguments

  • ϕ: A table of Shapley values, e.g. as returned from the shapley function.
  • predict: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions as Distributions.Sampleable objects.
  • a: A Shapley.Algorithm object which specifies how the computation should be performed. See subtypes(Shapley.Algorithm) for a list of available algorithms.
  • X: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.
  • js: The feature (or features, as an iterable) to compute Shapley statistics for.
  • data: The dataset to compute Shapley statistics for.
  • statistics: A named tuple of additional statistics. The keys should be the names of the columns in the summary table and the values should be functions which accept a Vector{<:Real}.
Shapley.supports_classificationMethod
supports_classification(a::Algorithm)

Whether the algorithm supports classifiers. If false, the user is responsible for providing a prediction function which returns numerical arrays.

Shapley.tableopMethod
tableop(op, args...)

Apply op element-wise on arguments args. The first argument must be a table that complies with the Tables interface. If a subsequent element is a scalar, this calls op.(args[1], args[2:end]...).