Abstract
The rapid adoption of LLM-based agents raises new challenges for transparency, reproducibility, and governance. Existing artifacts (Model Cards, Data Sheets) document models and data, but there is no standard for the operational characteristics of agents. We introduce Agent Cards, a lightweight artifact that captures roles, memory, tool integrations, communication protocols, monitoring hooks, governance scope, and evaluation metrics. Standardized descriptions make agents transparent, comparable, and auditable across deployments. We provide a template, an illustrative example, and guidance for integrating Agent Cards into MLOps/LLMOps practices. We argue that Agent Cards can underpin future work on agent ledgers, audit bundles, and maturity frameworks, giving practitioners and researchers a shared vocabulary for responsible operationalization of agentic AI.
Related material
Proposed Agent Card Template
| Section | Description |
|---|---|
| Agent version | Semantic version of the agent release (e.g., 1.2) |
| Agent Name | Identifier of the agent. |
| Agent Role(s) | Planner, Executor, Critic, Orchestrator (list specific roles). |
| Inputs | Text files, APIs, structured and unstructured data. |
| Outputs | API responses or text. |
| Memory | Short-term: current turn/context window profile, Long-term. |
| Tools/Functions | Capabilities the agent can invoke beyond its core LLM (calculators, retrieval, external APIs, spreadsheets, domain tools). Include type, purpose, and how it extends abilities. |
| Communication | Human interface (chat/UI); agent-to-agent protocols; message schemas/versions; handoff/approval policies. |
| Monitoring | Logged metrics (latency, token usage, error rate); trace IDs; inference profile/feature flags; SLOs and alert routes. |
| Governance | Safety filters/guardrails; PII/PHI handling; data retention and access control; approvals and audit checkpoints. |
| Versioning | Release tag/date; prompt hash; toolchain/SBOM; external dependency versions; overall reproducibility hash. |
| Known Limitations | Scope boundaries; partial automation notes; brittleness/non-determinism sources (e.g., upstream API variability). |
| Evaluation | Benchmarks/KPIs (e.g., RAG quality, long-context stress); calibration/abstention policy; evaluation datasets/snapshots; last run date and results. |
⚙️ Installation
pip install agentcard
🚀 Quick Start
from agentcard import AgentCard
card = AgentCard.from_yaml("example.yaml")
print(card.name) # TaxAdvisorBot
card.register_to_phoenix()
Output
Registered agent agent-001 with Phoenix observability.
# WIP
agent_version: "1.2" # Semantic version of the agent release
agent_name: "<Agent Name>" # Identifier of the agent
agent_roles: # Planner, Executor, Critic, Orchestrator (list specific roles)
- "<role-1>"
- "<role-2>"
inputs: # Text files, APIs, structured/unstructured data
- "<input-type-1>"
- "<input-type-2>"
outputs: # API responses or text
- "<output-type-1>"
- "<output-type-2>"
memory:
short_term: "<context window / turn profile>"
long_term: "<stores/indexes/TTL>"
tools_functions: # Each tool’s type, purpose, and how it extends abilities
- name: "<tool-name>"
type: "<api|retrieval|calculator|spreadsheet|custom>"
purpose: "<what it’s for>"
extends: "<how it augments the agent>"
communication:
human_interface: "<chat|UI|api>"
agent_protocols: # A2A/ACP/etc., plus schema/version if relevant
- protocol: "<protocol-name>"
version: "<x.y>"
message_schema: "<link/ID>"
handoff_policy: "<rules for handoffs/approvals>"
monitoring:
metrics: # latency, token usage, error rate, etc.
latency_ms_p95: "<target or observed>"
token_usage_avg: "<value/unit>"
error_rate: "<value>"
tracing: "<trace IDs / correlation IDs available?>"
inference_profile: "<model/profile/feature flags>"
slos: # SLOs and alert routes
- name: "<SLO name>"
objective: "<e.g., p95<=3000ms>"
alert_route: "<pager/webhook>"
governance:
safety_filters: "<guardrails applied>"
pii_phi_handling: "<masking/minimization rules>"
data_retention: "<policy/TTL>"
access_control: "<RBAC/ABAC rules>"
approvals_audit: "<checkpoints / reviewers>"
versioning:
release_tag: "<vX.Y.Z / date>"
prompt_hash: "<hash>"
sbom_toolchain: "<SBOM or dependency set>"
dependency_versions:
- "<name@version>"
reproducibility_hash: "<overall repro hash>"
known_limitations:
- "<scope boundary or brittleness source>"
- "<non-determinism notes (e.g., upstream API variability)>"
evaluation:
benchmarks_kpis: # e.g., RAG quality, long-context stress
- "<benchmark-or-KPI>"
calibration_abstention_policy: "<rules>"
datasets_snapshots:
- name: "<dataset>"
snapshot_date: "<YYYY-MM-DD>"
last_run_date: "<YYYY-MM-DD>"
results_summary: "<short text or key metrics>"
Example
agent_version: "1.2"
agent_name: "TaxAdvisorBot"
agent_roles: ["Executor", "Critic"]
inputs: ["user_text", "SAT_CFDI_XML", "CSV"]
outputs: ["explanations", "checklists", "spreadsheets"]
memory:
short_term: "16k context window"
long_term: "mx_tax_index_v2025_08 (TTL 30 days)"
tools_functions:
- name: "sat_api"
type: "api"
purpose: "Lookup filings and validate forms"
extends: "Adds authoritative tax data to answers"
- name: "calc"
type: "calculator"
purpose: "Compute withholdings and totals"
extends: "Reliable numeric operations"
communication:
human_interface: "chat"
agent_protocols:
- protocol: "A2A"
version: "1.0"
message_schema: "schemas/a2a_v1.json"
handoff_policy: "Escalate to AccountingBot for reconciliation tasks"
monitoring:
metrics:
latency_ms_p95: "≤3000"
token_usage_avg: "1.2k tokens/turn"
error_rate: "≤2%"
tracing: "trace_id, span_id"
inference_profile: "claude-sonnet-4 profile: prod"
slos:
- name: "Tool success"
objective: "≥0.90"
alert_route: "grafana:webhook/prod"
governance:
safety_filters: "regex & PII scrubber"
pii_phi_handling: "mask logs; partial redaction"
data_retention: "30 days"
access_control: "RBAC: analyst, admin"
approvals_audit: "monthly review; change log"
versioning:
release_tag: "v1.2.0 (2025-09-01)"
prompt_hash: "p_8f3c…"
sbom_toolchain: "sbom_2025_09.json"
dependency_versions: ["pydantic@2.8.0", "httpx@0.27.0"]
reproducibility_hash: "sha256:7e4a…"
known_limitations:
- "Not legal representation"
- "SAT API rate limits can delay responses"
evaluation:
benchmarks_kpis: ["RAG quality@RAGAS", "form_validation_acc"]
calibration_abstention_policy: "defer on low confidence"
datasets_snapshots:
- name: "cases_50"
snapshot_date: "2025-08-01"
last_run_date: "2025-09-01"
results_summary: "Form validation accuracy 0.92; tool_success 0.91"
📖 Citation
Urteaga-Reyesvera, J. C., & Lopez Murphy, J. J. (2025).
Agent Cards: A Documentation Standard for Operational AI Agents.
In MICAI 2025 Workshops (Lecture Notes in Artificial Intelligence).
Springer Nature Switzerland AG. (Forthcoming)
BibTeX
@inproceedings{urteaga2025agentcards,
author = {Urteaga-Reyesvera, J. Carlos and Lopez Murphy, Juan Jose},
title = {Agent Cards: A Documentation Standard for Operational AI Agents},
booktitle = {Proceedings of the MICAI 2025 Workshops},
series = {Lecture Notes in Artificial Intelligence},
publisher = {Springer Nature Switzerland AG},
note = {Forthcoming},
year = {2025},
}