Abstract

The rapid adoption of LLM-based agents raises new challenges for transparency, reproducibility, and governance. Existing artifacts (Model Cards, Data Sheets) document models and data, but there is no standard for the operational characteristics of agents. We introduce Agent Cards, a lightweight artifact that captures roles, memory, tool integrations, communication protocols, monitoring hooks, governance scope, and evaluation metrics. Standardized descriptions make agents transparent, comparable, and auditable across deployments. We provide a template, an illustrative example, and guidance for integrating Agent Cards into MLOps/LLMOps practices. We argue that Agent Cards can underpin future work on agent ledgers, audit bundles, and maturity frameworks, giving practitioners and researchers a shared vocabulary for responsible operationalization of agentic AI.


Related material

Proposed Agent Card Template

SectionDescription
Agent versionSemantic version of the agent release (e.g., 1.2)
Agent NameIdentifier of the agent.
Agent Role(s)Planner, Executor, Critic, Orchestrator (list specific roles).
InputsText files, APIs, structured and unstructured data.
OutputsAPI responses or text.
MemoryShort-term: current turn/context window profile, Long-term.
Tools/FunctionsCapabilities the agent can invoke beyond its core LLM (calculators, retrieval, external APIs, spreadsheets, domain tools). Include type, purpose, and how it extends abilities.
CommunicationHuman interface (chat/UI); agent-to-agent protocols; message schemas/versions; handoff/approval policies.
MonitoringLogged metrics (latency, token usage, error rate); trace IDs; inference profile/feature flags; SLOs and alert routes.
GovernanceSafety filters/guardrails; PII/PHI handling; data retention and access control; approvals and audit checkpoints.
VersioningRelease tag/date; prompt hash; toolchain/SBOM; external dependency versions; overall reproducibility hash.
Known LimitationsScope boundaries; partial automation notes; brittleness/non-determinism sources (e.g., upstream API variability).
EvaluationBenchmarks/KPIs (e.g., RAG quality, long-context stress); calibration/abstention policy; evaluation datasets/snapshots; last run date and results.

⚙️ Installation

pip install agentcard

🚀 Quick Start

from agentcard import AgentCard

card = AgentCard.from_yaml("example.yaml")
print(card.name)  # TaxAdvisorBot

card.register_to_phoenix()

Output

Registered agent agent-001 with Phoenix observability.
# WIP
agent_version: "1.2"            # Semantic version of the agent release
agent_name: "<Agent Name>"      # Identifier of the agent
agent_roles:                    # Planner, Executor, Critic, Orchestrator (list specific roles)
  - "<role-1>"
  - "<role-2>"

inputs:                         # Text files, APIs, structured/unstructured data
  - "<input-type-1>"
  - "<input-type-2>"

outputs:                        # API responses or text
  - "<output-type-1>"
  - "<output-type-2>"

memory:
  short_term: "<context window / turn profile>"
  long_term: "<stores/indexes/TTL>"

tools_functions:                # Each tool’s type, purpose, and how it extends abilities
  - name: "<tool-name>"
    type: "<api|retrieval|calculator|spreadsheet|custom>"
    purpose: "<what it’s for>"
    extends: "<how it augments the agent>"

communication:
  human_interface: "<chat|UI|api>"
  agent_protocols:              # A2A/ACP/etc., plus schema/version if relevant
    - protocol: "<protocol-name>"
      version: "<x.y>"
      message_schema: "<link/ID>"
  handoff_policy: "<rules for handoffs/approvals>"

monitoring:
  metrics:                      # latency, token usage, error rate, etc.
    latency_ms_p95: "<target or observed>"
    token_usage_avg: "<value/unit>"
    error_rate: "<value>"
  tracing: "<trace IDs / correlation IDs available?>"
  inference_profile: "<model/profile/feature flags>"
  slos:                         # SLOs and alert routes
    - name: "<SLO name>"
      objective: "<e.g., p95<=3000ms>"
      alert_route: "<pager/webhook>"

governance:
  safety_filters: "<guardrails applied>"
  pii_phi_handling: "<masking/minimization rules>"
  data_retention: "<policy/TTL>"
  access_control: "<RBAC/ABAC rules>"
  approvals_audit: "<checkpoints / reviewers>"

versioning:
  release_tag: "<vX.Y.Z / date>"
  prompt_hash: "<hash>"
  sbom_toolchain: "<SBOM or dependency set>"
  dependency_versions:
    - "<name@version>"
  reproducibility_hash: "<overall repro hash>"

known_limitations:
  - "<scope boundary or brittleness source>"
  - "<non-determinism notes (e.g., upstream API variability)>"

evaluation:
  benchmarks_kpis:              # e.g., RAG quality, long-context stress
    - "<benchmark-or-KPI>"
  calibration_abstention_policy: "<rules>"
  datasets_snapshots:
    - name: "<dataset>"
      snapshot_date: "<YYYY-MM-DD>"
  last_run_date: "<YYYY-MM-DD>"
  results_summary: "<short text or key metrics>"

Example

agent_version: "1.2"
agent_name: "TaxAdvisorBot"
agent_roles: ["Executor", "Critic"]

inputs: ["user_text", "SAT_CFDI_XML", "CSV"]
outputs: ["explanations", "checklists", "spreadsheets"]

memory:
  short_term: "16k context window"
  long_term: "mx_tax_index_v2025_08 (TTL 30 days)"

tools_functions:
  - name: "sat_api"
    type: "api"
    purpose: "Lookup filings and validate forms"
    extends: "Adds authoritative tax data to answers"
  - name: "calc"
    type: "calculator"
    purpose: "Compute withholdings and totals"
    extends: "Reliable numeric operations"

communication:
  human_interface: "chat"
  agent_protocols:
    - protocol: "A2A"
      version: "1.0"
      message_schema: "schemas/a2a_v1.json"
  handoff_policy: "Escalate to AccountingBot for reconciliation tasks"

monitoring:
  metrics:
    latency_ms_p95: "≤3000"
    token_usage_avg: "1.2k tokens/turn"
    error_rate: "≤2%"
  tracing: "trace_id, span_id"
  inference_profile: "claude-sonnet-4 profile: prod"
  slos:
    - name: "Tool success"
      objective: "≥0.90"
      alert_route: "grafana:webhook/prod"

governance:
  safety_filters: "regex & PII scrubber"
  pii_phi_handling: "mask logs; partial redaction"
  data_retention: "30 days"
  access_control: "RBAC: analyst, admin"
  approvals_audit: "monthly review; change log"

versioning:
  release_tag: "v1.2.0 (2025-09-01)"
  prompt_hash: "p_8f3c…"
  sbom_toolchain: "sbom_2025_09.json"
  dependency_versions: ["pydantic@2.8.0", "httpx@0.27.0"]
  reproducibility_hash: "sha256:7e4a…"

known_limitations:
  - "Not legal representation"
  - "SAT API rate limits can delay responses"

evaluation:
  benchmarks_kpis: ["RAG quality@RAGAS", "form_validation_acc"]
  calibration_abstention_policy: "defer on low confidence"
  datasets_snapshots:
    - name: "cases_50"
      snapshot_date: "2025-08-01"
  last_run_date: "2025-09-01"
  results_summary: "Form validation accuracy 0.92; tool_success 0.91"

📖 Citation

Urteaga-Reyesvera, J.C., Lopez Murphy, J.J. (2026). Agent Cards: A Documentation Standard for Operational AI Agents. In: Martínez-Villaseñor, L., et al. Advances in Computational Intelligence. MICAI 2025 International Workshops. MICAI 2025. Lecture Notes in Computer Science(), vol 16265. Springer, Cham. https://doi.org/10.1007/978-3-032-17933-3_25

BibTeX

@InProceedings{10.1007/978-3-032-17933-3_25,
author="Urteaga-Reyesvera, J. Carlos
and Lopez Murphy, Juan Jose",
editor="Mart{\'i}nez-Villase{\~{n}}or, Lourdes
and V{\'a}zquez, Roberto A.
and Ochoa-Ruiz, Gilberto
and Montes Rivera, Mart{\'i}n
and Zapotecas-Mart{\'i}nez, Sa{\'u}l
and Barr{\'o}n-Estrada, Mar{\'i}a Luc{\'i}a
and Mezura-Montes, Efr{\'e}n
and Gomez Chavez, Arturo",
title="Agent Cards: A Documentation Standard for Operational AI Agents",
booktitle="Advances in Computational Intelligence. MICAI 2025 International Workshops",
year="2026",
publisher="Springer Nature Switzerland",
address="Cham",
pages="252--261",
abstract="The rapid adoption of large language models (LLMs) into AI agents has created new challenges for transparency, reproducibility, and governance. While prior artifacts such as Model Cards and Data Sheets for Datasets support documentation of models and data, no analogous standard exists for describing the operational characteristics of AI agents. This paper introduces Agent Cards, a structured documentation artifact designed to capture the essential attributes of an agent, including its roles, memory taxonomy, tool integrations, communication protocols, monitoring hooks, governance scope, and evaluation metrics. By standardizing how agents are described, Agent Cards provide a lightweight yet powerful mechanism for enabling transparency, comparability, and auditability across deployments. A template is presented, accompanied by an illustrative example, followed by a discussion of the benefits of adopting Agent Cards within broader MLOps and LLMOps practices. Agent Cards are proposed as a potential foundation for future work on agent ledgers, audit bundles, and maturity frameworks, offering practitioners and researchers a common vocabulary for the responsible operationalization of agentic AI.",
isbn="978-3-032-17933-3"
}