Editing Explainable Ai

<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
Explainable AI (XAI) is the set of methods, techniques, and tools that make the outputs, decisions, and internal workings of AI systems understandable to humans. As AI is deployed in high-stakes domains — medical diagnosis, credit scoring, criminal justice, autonomous systems — the ability to explain why a model made a particular decision becomes essential for trust, accountability, debugging, and regulatory compliance. XAI bridges the gap between powerful black-box models and the human need to understand, audit, and challenge automated decisions.
</div>

__TOC__

<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Explainability''' — The degree to which a model's behavior can be understood and explained to humans.
* '''Interpretability''' — The degree to which the internal mechanisms of a model can be directly understood; often used interchangeably with explainability.
* '''Black-box model''' — A model whose internal workings are opaque; predictions are produced but the reasoning is not accessible (deep neural networks, ensemble methods).
* '''Glass-box model''' — An inherently interpretable model whose reasoning is transparent by design (linear regression, decision trees, rule lists).
* '''Post-hoc explanation''' — An explanation generated after a model is trained, attempting to approximate or explain its behavior (SHAP, LIME).
* '''SHAP (SHapley Additive exPlanations)''' — A game-theoretic framework that assigns each feature a contribution value for each prediction.
* '''LIME (Local Interpretable Model-agnostic Explanations)''' — A technique that approximates complex model behavior locally with a simple interpretable model.
* '''Feature importance''' — A ranking of input features by their overall contribution to the model's predictions.
* '''Saliency map''' — A visualization highlighting which input pixels or regions most influenced a neural network's output on an image.
* '''Grad-CAM''' — A gradient-based visualization technique showing which image regions a CNN attended to for a prediction.
* '''Counterfactual explanation''' — An explanation of the form "if X had been different, the prediction would have changed to Y."
* '''Anchors''' — Rule-based explanations that identify sufficient conditions guaranteeing a prediction regardless of changes to other features.
* '''Model card''' — A documentation framework that describes a model's intended use, performance, and limitations to stakeholders.
* '''Right to explanation''' — A legal concept (GDPR Article 22) granting individuals the right to understand automated decisions affecting them.
* '''Faithfulness''' — The degree to which an explanation accurately reflects the model's actual reasoning process.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
There is a fundamental tension in AI between '''model complexity and interpretability''': the most accurate models (deep neural networks, gradient boosting ensembles) are the least interpretable, while the most interpretable models (linear regression, shallow decision trees) are often less accurate. XAI attempts to navigate this tension.

'''Two strategies''':

'''Intrinsically interpretable models''': Choose model architectures that are interpretable by design. Linear models explain predictions as weighted feature sums. Generalized Additive Models (GAMs) extend this to non-linear feature contributions. Decision trees can be visualized. Rule lists produce human-readable decision logic. For high-stakes decisions, these are often preferable even at some accuracy cost.

'''Post-hoc explanation''': Train any model, then explain its predictions afterward. SHAP computes each feature's Shapley value — its average marginal contribution across all possible feature orderings — providing a theoretically principled attribution. LIME fits a local linear model around the prediction to approximate the complex model's behavior in that region.

'''The faithfulness problem''': Post-hoc explanations don't explain the model — they explain a simpler approximation of the model. An explanation that looks plausible may not accurately reflect the model's actual reasoning. This is known as the faithfulness problem and is a fundamental limitation of post-hoc XAI.

'''Explanation types by audience''': Data scientists need feature attributions and global model behavior. Domain experts need contrastive explanations ("why X not Y?"). End users affected by decisions need natural language explanations. Regulators need documentation and audit trails.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Computing SHAP values for a gradient boosting model:'''
<syntaxhighlight lang="python">
import shap
import xgboost as xgb
import pandas as pd
from sklearn.model_selection import train_test_split

# Train a model
X, y = shap.datasets.adult()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = xgb.XGBClassifier(n_estimators=100, max_depth=4)
model.fit(X_train, y_train)

# SHAP explanation
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Global feature importance
shap.summary_plot(shap_values, X_test, plot_type="bar")

# Local explanation for one prediction
shap.waterfall_plot(shap.Explanation(
    values=shap_values[0],
    base_values=explainer.expected_value,
    data=X_test.iloc[0],
    feature_names=X_test.columns.tolist()
))
</syntaxhighlight>

'''Grad-CAM for CNN visual explanation:'''
<syntaxhighlight lang="python">
import torch, torch.nn.functional as F
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
model.eval()
# Hook on last conv layer to capture gradients and activations
activations, gradients = {}, {}
model.layer4[-1].register_forward_hook(lambda m, i, o: activations.update({'feat': o}))
model.layer4[-1].register_full_backward_hook(lambda m, gi, go: gradients.update({'feat': go[0]}))
output = model(input_tensor)
model.zero_grad()
output[0, output.argmax()].backward()
cam = (gradients['feat'] * activations['feat']).mean(dim=[2, 3], keepdim=True)
cam = F.relu(cam).squeeze()  # Heatmap
</syntaxhighlight>

; XAI method by context
: '''Tabular, tree models''' → SHAP TreeExplainer (exact, fast)
: '''Tabular, any model''' → SHAP KernelExplainer, LIME
: '''Image classification''' → Grad-CAM, SHAP GradientExplainer, Integrated Gradients
: '''NLP''' → SHAP on tokenized input, attention visualization, LIME for text
: '''High-stakes decisions''' → Counterfactual explanations, contrastive explanations
: '''Regulatory compliance''' → Model cards, audit trails, decision logs
</div>

<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
|+ XAI Method Comparison
! Method !! Model-Agnostic !! Local/Global !! Faithfulness !! Computational Cost
|-
| SHAP (Tree) || Tree models only || Both || High || Low
|-
| SHAP (Kernel) || Yes || Local || Medium || Very high
|-
| LIME || Yes || Local || Medium || Medium
|-
| Grad-CAM || Neural nets || Local || Medium || Low
|-
| Anchors || Yes || Local || High || Medium
|-
| Inherently interpretable model || No (design choice) || Both || Perfect || Very low
|}

'''Failure modes''': Unfaithful explanations that look convincing but don't reflect actual model reasoning. Adversarial explanation manipulation — explanations can be crafted to look fair while hiding discriminatory behavior. Cognitive overload — too many feature attributions confuse rather than clarify. Post-hoc explanations for high-stakes decisions creating false confidence in unexplainable models.
</div>

<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Expert XAI evaluation:
# '''Faithfulness''' — do removing features with high SHAP values degrade performance proportionally?
# '''Stability''' — does LIME/SHAP produce consistent explanations for similar inputs?
# '''Plausibility''' — do domain experts find the explanations credible?
# '''Actionability''' — can users act on the explanation? Counterfactual explanations are most actionable.
# '''User study''' — measure whether explanations improve human task performance, not just perceived trust.
</div>

<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing an explainability framework for a production decision system:
# Choose the simplest model that meets accuracy requirements — interpretable by default is better than explained after the fact.
# If complex model is necessary, generate SHAP values for every prediction and log them.
# Build a user-facing explanation UI: show top 3 positive and negative factors in plain language.
# Generate counterfactual explanations for rejected applications: "If your income were $5,000 higher, the decision would be Approved."
# Implement explanation audit trail for regulatory compliance.
# Run periodic faithfulness audits: verify that explanations align with actual model behavior.

[[Category:Artificial Intelligence]]
[[Category:Explainable AI]]
[[Category:Machine Learning]]
</div>