Algorithms and Parallelism
As has been discussed, in nearly all cases Shapley values are impossible to compute exactly, both due to computational intractibility and lack of knowledge of the true distribution.
This package's API allows users to use a common interface. It consists of Algorithm
objects which store parameters associated with the particular algorithm.
Additionally, these objects describe what parallelism, if any, should be used in the computation. These are associated with the algorithms because different algorithms may have different ways of utilizing parallelism.
This package uses ComputationalResources.jl to describe which computational resources, and which degree of parallelism should be used in a computation.
For the time being, the package only provides the Shaplye.MonteCarlo
algorithm.
Available Algorithms
Shapley.MonteCarlo
Shapley.MonteCarlo
— TypeMonteCarlo{P<:AbstractResource,R<:AbstractRNG}
A Shapley.Algorithm
object for describing the computation of Shapley values via the monte carlo algorithm introduced by Strumbelj, Kononenko 2014.
The res
argument in the constructors specifies a computational resource for executing the computation. Currently available options are CPU1()
(single threaded) and CPUThreads()
(all threads available to the Julia process). These are provided by the ComputationalResources.jl package.
Constructors
MonteCarlo(res, M, rng=Random.GLOBAL_RNG)
MonteCarlo(M, rng=Random.GLOBAL_RNG)
Arguments
res
: AnAbstractResource
computational resource. PassCPUThreads()
to run multi-threaded.M
: The total number of iterations (samples).rng
: A random number generator object to use.
Computational Resources:
CPU1()
: Single threaded.CPUThreads()
: Independent iterations of the monte carlo algorithm will be computed on separate threads. See the Julia multi-threading documentation for instructions on how to set the number of CPU threads available to the Julia process.
Shapley.KernelSHAP
Shapley.KernelSHAP
— TypeKernelSHAP{P<:AbstractResource,R<:AbstractRNG,NT<:NamedTuple}
A Shapley.Algorithm
object for describing the computation of Shapley values via the Kernel SHAP algorithm.
The res
argument in the constructors specfies a computational resource, currently either CPU1()
or CPUThreads()
is available.
This algorithm uses the lasso regression from Lasso.jl to perform regularized linear regressions. Note that because this algorithm involves a lasso fit for each individual data point, it is incredibly slow.
Constructors
KernelSHAP(res, ν, M, rng=Random.GLOBAL_RNG; ϕ₀_colname=:ϕ₀, kwargs...)
KernelSHAP(res, ν, M, rng=Random.GLOBAL_RNG; ϕ₀_colname=:ϕ₀, kwargs...)
Arguments
res
: AnAbstractResource
computational resource. PassCPUThreads()
to run multi-threaded.ν
: The number of distinct coalitions to use. It cannot excedefloor(N/2)
whereN
is the number of features.M
: The multiplicity of sample data points per coalition.ϕ₀_colname
: The column name to use for the intercept Shapley value. Ifnothing
, colum will not be included.kwargs
: Keyword arguments used by the fit. The fit is performed by Lasso.jl, see here for available arguments.
Computational Resources:
CPU1()
: Single threaded.CPUThreads()
: Linear regressions will be performed on different threads.
Using KernelSHAP
effictively will require a good understanding of the "lasso" ($L_1,L_2$ regularized linear regressions) regression from the Lasso.jl package. In particular see the documentation for fit
. The fit method called by KernelSHAP
is fit(LassoModel, X, y; kwargs...)
.