Explainable AI

From BloomWiki
Revision as of 01:50, 25 April 2026 by Wordpad (talk | contribs) (BloomWiki: Explainable AI)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Explainable AI (XAI) is the set of methods, techniques, and tools that make the outputs, decisions, and internal workings of AI systems understandable to humans. As AI is deployed in high-stakes domains — medical diagnosis, credit scoring, criminal justice, autonomous systems — the ability to explain why a model made a particular decision becomes essential for trust, accountability, debugging, and regulatory compliance. XAI bridges the gap between powerful black-box models and the human need to understand, audit, and challenge automated decisions.

Remembering

  • Explainability — The degree to which a model's behavior can be understood and explained to humans.
  • Interpretability — The degree to which the internal mechanisms of a model can be directly understood; often used interchangeably with explainability.
  • Black-box model — A model whose internal workings are opaque; predictions are produced but the reasoning is not accessible (deep neural networks, ensemble methods).
  • Glass-box model — An inherently interpretable model whose reasoning is transparent by design (linear regression, decision trees, rule lists).
  • Post-hoc explanation — An explanation generated after a model is trained, attempting to approximate or explain its behavior (SHAP, LIME).
  • SHAP (SHapley Additive exPlanations) — A game-theoretic framework that assigns each feature a contribution value for each prediction.
  • LIME (Local Interpretable Model-agnostic Explanations) — A technique that approximates complex model behavior locally with a simple interpretable model.
  • Feature importance — A ranking of input features by their overall contribution to the model's predictions.
  • Saliency map — A visualization highlighting which input pixels or regions most influenced a neural network's output on an image.
  • Grad-CAM — A gradient-based visualization technique showing which image regions a CNN attended to for a prediction.
  • Counterfactual explanation — An explanation of the form "if X had been different, the prediction would have changed to Y."
  • Anchors — Rule-based explanations that identify sufficient conditions guaranteeing a prediction regardless of changes to other features.
  • Model card — A documentation framework that describes a model's intended use, performance, and limitations to stakeholders.
  • Right to explanation — A legal concept (GDPR Article 22) granting individuals the right to understand automated decisions affecting them.
  • Faithfulness — The degree to which an explanation accurately reflects the model's actual reasoning process.

Understanding

There is a fundamental tension in AI between **model complexity and interpretability**: the most accurate models (deep neural networks, gradient boosting ensembles) are the least interpretable, while the most interpretable models (linear regression, shallow decision trees) are often less accurate. XAI attempts to navigate this tension.

    • Two strategies**:
    • Intrinsically interpretable models**: Choose model architectures that are interpretable by design. Linear models explain predictions as weighted feature sums. Generalized Additive Models (GAMs) extend this to non-linear feature contributions. Decision trees can be visualized. Rule lists produce human-readable decision logic. For high-stakes decisions, these are often preferable even at some accuracy cost.
    • Post-hoc explanation**: Train any model, then explain its predictions afterward. SHAP computes each feature's Shapley value — its average marginal contribution across all possible feature orderings — providing a theoretically principled attribution. LIME fits a local linear model around the prediction to approximate the complex model's behavior in that region.
    • The faithfulness problem**: Post-hoc explanations don't explain the model — they explain a simpler approximation of the model. An explanation that looks plausible may not accurately reflect the model's actual reasoning. This is known as the faithfulness problem and is a fundamental limitation of post-hoc XAI.
    • Explanation types by audience**: Data scientists need feature attributions and global model behavior. Domain experts need contrastive explanations ("why X not Y?"). End users affected by decisions need natural language explanations. Regulators need documentation and audit trails.

Applying

Computing SHAP values for a gradient boosting model: <syntaxhighlight lang="python"> import shap import xgboost as xgb import pandas as pd from sklearn.model_selection import train_test_split

  1. Train a model

X, y = shap.datasets.adult() X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = xgb.XGBClassifier(n_estimators=100, max_depth=4) model.fit(X_train, y_train)

  1. SHAP explanation

explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)

  1. Global feature importance

shap.summary_plot(shap_values, X_test, plot_type="bar")

  1. Local explanation for one prediction

shap.waterfall_plot(shap.Explanation(

   values=shap_values[0],
   base_values=explainer.expected_value,
   data=X_test.iloc[0],
   feature_names=X_test.columns.tolist()

)) </syntaxhighlight>

Grad-CAM for CNN visual explanation: <syntaxhighlight lang="python"> import torch, torch.nn.functional as F from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2) model.eval()

  1. Hook on last conv layer to capture gradients and activations

activations, gradients = {}, {} model.layer4[-1].register_forward_hook(lambda m, i, o: activations.update({'feat': o})) model.layer4[-1].register_full_backward_hook(lambda m, gi, go: gradients.update({'feat': go[0]})) output = model(input_tensor) model.zero_grad() output[0, output.argmax()].backward() cam = (gradients['feat'] * activations['feat']).mean(dim=[2, 3], keepdim=True) cam = F.relu(cam).squeeze() # Heatmap </syntaxhighlight>

XAI method by context
Tabular, tree models → SHAP TreeExplainer (exact, fast)
Tabular, any model → SHAP KernelExplainer, LIME
Image classification → Grad-CAM, SHAP GradientExplainer, Integrated Gradients
NLP → SHAP on tokenized input, attention visualization, LIME for text
High-stakes decisions → Counterfactual explanations, contrastive explanations
Regulatory compliance → Model cards, audit trails, decision logs

Analyzing

XAI Method Comparison
Method Model-Agnostic Local/Global Faithfulness Computational Cost
SHAP (Tree) Tree models only Both High Low
SHAP (Kernel) Yes Local Medium Very high
LIME Yes Local Medium Medium
Grad-CAM Neural nets Local Medium Low
Anchors Yes Local High Medium
Inherently interpretable model No (design choice) Both Perfect Very low

Failure modes: Unfaithful explanations that look convincing but don't reflect actual model reasoning. Adversarial explanation manipulation — explanations can be crafted to look fair while hiding discriminatory behavior. Cognitive overload — too many feature attributions confuse rather than clarify. Post-hoc explanations for high-stakes decisions creating false confidence in unexplainable models.

Evaluating

Expert XAI evaluation: (1) **Faithfulness** — do removing features with high SHAP values degrade performance proportionally? (2) **Stability** — does LIME/SHAP produce consistent explanations for similar inputs? (3) **Plausibility** — do domain experts find the explanations credible? (4) **Actionability** — can users act on the explanation? Counterfactual explanations are most actionable. (5) **User study** — measure whether explanations improve human task performance, not just perceived trust.

Creating

Designing an explainability framework for a production decision system: (1) Choose the simplest model that meets accuracy requirements — interpretable by default is better than explained after the fact. (2) If complex model is necessary, generate SHAP values for every prediction and log them. (3) Build a user-facing explanation UI: show top 3 positive and negative factors in plain language. (4) Generate counterfactual explanations for rejected applications: "If your income were $5,000 higher, the decision would be Approved." (5) Implement explanation audit trail for regulatory compliance. (6) Run periodic faithfulness audits: verify that explanations align with actual model behavior.