API Docs
Public
See the Algorithms and Parallelism section for the doc strings of particular algorithms.
Only shapley
is exported.
Shapley.shapley
— Functionshapley(predict, a, X)
shapley(predict, a, X, j)
shapley(predict, a, X, Z)
shapley(predict, a, X, j, Z)
Compute the Shapley values of the points in a dataset Z
, using the dataset X
as an empirical estimate of the distribution of the points in Z
. If Z
is not provided, Shapley values will be computed for the points in X
. There are a variety of inequivalent methods for computing Shapley values, and these should be specified with a
, a Shapley.Algorithm
object which specifies how the computation should be done. The predict
function will be used to make predictions on the dataset. The model object being evaluated is encoded via the predict
function.
If the prediction returned by predict
is deterministic, and therefore an array of numbers, the returned Shapley values will be numerical. If the prediction is probabilistic, it is expected that predict
returns an array of Distributions.Sampleable
objects and Shapley
will return a table. The columns of this table will be named according to the domain of the distributions returned, as determined via Distributions.support
.
All algorithms will assume that the function predict
is most efficient when operating on batches of data. That is, they will attempt to minimize the number of separate calls to predict
.
See the Shapley.jl documentation for more details and examples.
Arguments
predict
: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions asDistributions.Sampleable
objects.a
: AShapley.Algorithm
object which specifies how the computation should be performed. Seesubtypes(Shapley.Algorithm)
for a list of available algorithms.X
: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.j
: The feature to calculate the Shapley values for, as a column ofX
. Should be an integer orSymbol
. If not provided, Shapley values will be computed for all columns.Z
: The set of data points to compute the Shapley values of. If not provided,X
will be used.
Examples
m = machine(RandomForestRegressor(), X, y) # an MLJ machine object
ϕ = shapley(ξ -> predict(m, ξ), Shapley.MonteCarlo(512), X)
Shapley.summary
— Functionsummary(ϕ, js=colnames(ϕ); statistics=(;))
summary(predict, a, X; statistics=(;), data=X)
Create a summary table describing the statistics of Shapley values for features js
from the prediction function predict
using X
as an empirical estimate of the distribution of the data. By default, this will provide summaries of the mean of the absolute value, the root mean square, and the standard deviation. Users can specify additional statistics via the named tuple statistics
.
Note that the table returned is a Vector
of NamedTuple
. As a Tables compatible object, this can easily by converted to a more convenient form, for example DataFrame(summary(args...))
for a DataFrame
.
Currently summary
does not work for classifiers because of the different output type. In those cases one should start from shapley
.
Arguments
ϕ
: A table of Shapley values, e.g. as returned from theshapley
function.predict
: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions asDistributions.Sampleable
objects.a
: AShapley.Algorithm
object which specifies how the computation should be performed. Seesubtypes(Shapley.Algorithm)
for a list of available algorithms.X
: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.js
: The feature (or features, as an iterable) to compute Shapley statistics for.data
: The dataset to compute Shapley statistics for.statistics
: A named tuple of additional statistics. The keys should be the names of the columns in the summary table and the values should be functions which accept aVector{<:Real}
.
All
Shapley.Shapley
— ModuleShapley
This module contains methods for computing Shapley values for machine learning methods.
Only the functino shapley
is exported.
Shapley.Algorithm
— TypeAlgorithm{P<:AbstractResource}
Abstract type from which algorithms for computing Shapley values are descended. The type parameter is the type of computational resource to be used, but individual algorithms are not required to support all possible resources.
Shapley.:⊟
— Methoda ⊟ b
A subtraction operation which can optionally take as arguments arrays of Distributions.Sampleable
objects. In this case, the returned result will be a table the columns of which are determined by the domain (Distributions.support
) of the returned distribtions. This is needed by Shapley algorithms to handle the objects returned by predict
functions in both deterministic and probabilistic cases.
For all other cases, this operation is equivalent to a - b
.
Shapley._map
— Method_map(res::AbstractResource)
Get the appropriate map
function associated with the resource. For CPU1
this is Base.map
whereas for CPUThreads
it is ThreadsX.map
.
Shapley.coalitions
— Functioncoalitions(N, M, ν=(N-1))
Returns a boolean matrix z
the rows of which give all possible permutations of binary vectors of length N
containing up to ν
non-zeros, where rows with a smaller number of non-zero elements come before rows with a larger number of non-zero elements and there are no duplicates. The resulting matrix is repeated M
times.
The rows of z
are the "coalition" vectors needed by the KernelSHAP method such that they can be used to provide the best possible approximation for a fixed ν
.
Shapley.coalitions!
— Methodcoalitions!(r, z, n, k, α)
Write all possible binary vectors to the rows of boolean matrix z
containing n
non-zeros starting from position k
and row α
. This can be used recursively to generate all possible permutations of binary vectors such that those with a smaller number of non-zeros occur first without duplicates, see coalitions
.
Shapley.colnames
— Methodcolnames(X)
Get the column names of either a matrix or a table. If a matrix, uses default Tables.jl column names.
Shapley.compatibletable
— Methodcompatibletable(X)
Create a Tables.Columns
object from X
. This function is provided so that tables and arrays can be handled the same way.
Shapley.defaultkernel
— Methoddefaultkernel(M, zc)
The weighting kernel used by the kernel SHAP method. The resulting weight function grants more importance to coalitions in which fewer features participate.
Shapley.featureindex
— Methodfeatureindex(X, j::Union{Integer,Symbol})
Return the index corresponding to key j
in the columns of Tables
-compatible X
. E.g. if j
is an integer this simply returns j
.
Shapley.fh
— Methodfh(f, rng, z, X, Z)
Call the prediction function f
on data points with stochastic replacement from the function h!
.
Shapley.fitargs
— Methodfitargs(f, m::KernelSHAP, X, Z, z)
Generate the arguments needed by the kernel SHAP method to perform a fit.
Shapley.fitpoint
— Methodfitpoint(m::KernelSHAP, lhs, rhs)
fitpoint(m::KernelSHAP, lhs, rhs, i)
Perform the fit for the Kernel SHAP methods on the data provided by fitargs
. If i
is provided, this will be done for the specific data point i
.
Shapley.fitresultrow
— Methodfitresultrow(m::KernelSHAP, sch::Tables.Schema, lhs, rhs, i)
Return the end result of kernel SHAP for a particular data point.
Shapley.h!
— Functionh!(ξt, rng, zr, Xt=ξt)
Perform the stochastic replacement of data in accordance with the row of the coalition matrix zr
in accordance with the Kernel SHAP method.
Shapley.imputeshuffled!
— Methodimputeshuffled!(rng, j, ξt, Xt, Zt)
Impute shuffled data into an empty dataset created for the Shapley monte carlo algorithm.
Shapley.isindependent
— Methodisindependent(a::Algorithm)
Whether the algorithm can compute Shapley values independently. If the result is true, then it is efficient to call the algorithm for only a particular feature. If false, when calling shapley
on a particular feature, the Shapley values for other features will be computed and discarded. It is therefore not recommended to call shapley
for subsets of features for algorithms for which isindependent
is false.
Shapley.montecarlo
— Methodmontecarlo(predict, m, X, j, Z)
A single iteration of the Shapley monte carlo algorithm.
Shapley.ncolumns
— Methodncolumns(X)
Get the number of columns of either a matrix or a table.
Shapley.nrows
— Methodnrows(X)
Get the number of rows of either a matrix or a table.
Shapley.ntcopy
— Methodntcopy(x)
Copy an object. This defaults to Base.copy
except for on NamedTuple
s where it copies each column.
Shapley.opnamedtuple
— Functionopnamedtuple(op, ϕ, n=Symbol(string(op)))
Create a named tuple the elements of which are the operation op
applied to the columns of Tables compatible object ϕ
. The names of the columns of the resulting table will be those of the columns of ϕ
prepended by n*'_'
.
Shapley.shapley
— Functionshapley(predict, a, X)
shapley(predict, a, X, j)
shapley(predict, a, X, Z)
shapley(predict, a, X, j, Z)
Compute the Shapley values of the points in a dataset Z
, using the dataset X
as an empirical estimate of the distribution of the points in Z
. If Z
is not provided, Shapley values will be computed for the points in X
. There are a variety of inequivalent methods for computing Shapley values, and these should be specified with a
, a Shapley.Algorithm
object which specifies how the computation should be done. The predict
function will be used to make predictions on the dataset. The model object being evaluated is encoded via the predict
function.
If the prediction returned by predict
is deterministic, and therefore an array of numbers, the returned Shapley values will be numerical. If the prediction is probabilistic, it is expected that predict
returns an array of Distributions.Sampleable
objects and Shapley
will return a table. The columns of this table will be named according to the domain of the distributions returned, as determined via Distributions.support
.
All algorithms will assume that the function predict
is most efficient when operating on batches of data. That is, they will attempt to minimize the number of separate calls to predict
.
See the Shapley.jl documentation for more details and examples.
Arguments
predict
: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions asDistributions.Sampleable
objects.a
: AShapley.Algorithm
object which specifies how the computation should be performed. Seesubtypes(Shapley.Algorithm)
for a list of available algorithms.X
: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.j
: The feature to calculate the Shapley values for, as a column ofX
. Should be an integer orSymbol
. If not provided, Shapley values will be computed for all columns.Z
: The set of data points to compute the Shapley values of. If not provided,X
will be used.
Examples
m = machine(RandomForestRegressor(), X, y) # an MLJ machine object
ϕ = shapley(ξ -> predict(m, ξ), Shapley.MonteCarlo(512), X)
Shapley.shuffledata
— Methodshuffledata(rng, X, j, Z)
Return shuffled instances of Z
as needed by the Shapley monte carlo algorithm.
Shapley.similartable
— Methodsimilartable(tab)
Creates a named tuple the values of which are obtained by calling similar
on the corresponding columns of the table tab
.
Shapley.summary
— Functionsummary(ϕ, js=colnames(ϕ); statistics=(;))
summary(predict, a, X; statistics=(;), data=X)
Create a summary table describing the statistics of Shapley values for features js
from the prediction function predict
using X
as an empirical estimate of the distribution of the data. By default, this will provide summaries of the mean of the absolute value, the root mean square, and the standard deviation. Users can specify additional statistics via the named tuple statistics
.
Note that the table returned is a Vector
of NamedTuple
. As a Tables compatible object, this can easily by converted to a more convenient form, for example DataFrame(summary(args...))
for a DataFrame
.
Currently summary
does not work for classifiers because of the different output type. In those cases one should start from shapley
.
Arguments
ϕ
: A table of Shapley values, e.g. as returned from theshapley
function.predict
: A function which takes a Tables compatible object (or a matrix) and returns a vector of determinstic predictions as real numbers or of probabilistic predictions asDistributions.Sampleable
objects.a
: AShapley.Algorithm
object which specifies how the computation should be performed. Seesubtypes(Shapley.Algorithm)
for a list of available algorithms.X
: An input dataset which is used by the Shapley algorithms for the purpose of providing an empirical estimation of the distribution of independent variables. Typically this would be a test or training set.js
: The feature (or features, as an iterable) to compute Shapley statistics for.data
: The dataset to compute Shapley statistics for.statistics
: A named tuple of additional statistics. The keys should be the names of the columns in the summary table and the values should be functions which accept aVector{<:Real}
.
Shapley.supports_classification
— Methodsupports_classification(a::Algorithm)
Whether the algorithm supports classifiers. If false, the user is responsible for providing a prediction function which returns numerical arrays.
Shapley.tableop
— Methodtableop(op, args...)
Apply op
element-wise on arguments args
. The first argument must be a table that complies with the Tables interface. If a subsequent element is a scalar, this calls op.(args[1], args[2:end]...)
.