2026 Hermes Agent Skills Advanced Guide:
From SKILL.md to GEPA Self-Evolution and Mac Cloud Production

Developers who already run Hermes Agent hit a ceiling fast: one-shot prompts do not scale, Memory stores facts but not procedures, and dumping SOPs into system context burns tokens every session. Nous Research built Hermes around "the agent that grows with you"—and the engine behind that promise is the Skills system: standardized, evolvable, cross-session procedural memory. This guide goes beyond install docs to cover SKILL.md format, Skill Bundles, conditional activation, Tap publishing, GEPA+DSPy self-evolution, Plugin skills, and the open-source Hub, with an eight-step runbook and FAQ. Confirm field names against Hermes and agentskills.io before you ship. Lease tiers are on the pricing page.

  • Runaway token spend: Pasting a full deploy SOP into the system prompt charges every turn. Skills use progressive loading—zero body cost until activation.
  • Cross-session amnesia: Plain prompts die when the chat ends. Skills and Memory both persist, but Skills teach how to execute, not what to remember.
  • Fragmented workflows: PR review, TDD, and deploy checks live in separate threads. Skill Bundles load the whole stack from one slash command.
  • Missing environment awareness: Loading a DuckDuckGo fallback when paid web_search is already configured wastes context. Conditional activation hides or shows skills by tool availability.
  • Stagnant skill quality: Most teams write SKILL.md once and never iterate. GEPA evolves skill text from execution traces without touching model weights.
  • No team sharing: Skills trapped in a personal folder cannot sync. Tap repos plus hermes skills tap add let the whole org subscribe to one source.
  • Confusion with MCP: Skills are procedural knowledge documents. MCP exposes tool endpoints. One teaches the flow; the other supplies capability—they complement each other.

Five questions this guide answers: How does progressive loading control tokens? How does conditional activation work? How do Bundles fire a full workflow? How does GEPA improve skills over time? Which community Taps are worth subscribing to?

Prompt vs Memory vs Skills three-way comparison
Dimension Plain Prompt Memory Skills
Persistence Current conversation only Cross-session, permanent Cross-session, permanent
Load timing Always in context Auto-injected each session On demand
Token cost Every turn Small and stable Zero body cost before activation
Content type Any intent User preferences and facts Procedural steps
Maintained by User manually Agent automatically User and Agent
Shareability Hard to reuse Private to the user Publishable as community Tap

Quick mnemonic: Prompt = sticky note (valid this chat only); Memory = notebook (permanent notes, always nearby); Skill = SOP manual (step-by-step playbook, opened when needed).

All Hermes Skills follow the open agentskills.io standard and port across Hermes, Claude Code, and Cursor. Recommended layout:

~/.hermes/skills/my-category/my-skill/
SKILL.md              main file, target under 500 lines
references/           API docs and examples, loaded on demand
templates/            reusable templates
scripts/              scripts the agent can execute directly
SKILL.md frontmatter example
---
name: my-skill
description: |
  Use when the user needs to [...].
  Handles [...] and [...].
version: 1.0.0
license: MIT
compatibility: Requires git, docker
allowed-tools: Bash(git:*) Read
metadata:
  hermes:
    tags: [devops, automation]
    category: software-development
    related_skills: [github-pr-workflow]
    requires_toolsets: [terminal]
    fallback_for_toolsets: [web]
---

Progressive Disclosure is the token-control core:

Progressive disclosure load tiers
Tier Content Trigger Token cost
Level 0 name + description Session start, all skills scanned ~3K total across all skills
Level 1 Full SKILL.md body /skill-name or LLM relevance match Depends on file length
Level 2 references/, scripts/ LLM decides at execution time Per file, on demand

The body should include Overview, When to Use, Procedure, Common Pitfalls, and Verification Checklist. The critical field is description: it is the only Level 0 routing signal. Lead with "Use when..." and stay under 1024 characters. Keep name lowercase with hyphens, max 64 characters.

Skill Bundles (2026) pack multiple skills into a single slash command. Running /bundle-name loads every listed skill at once. Files live at ~/.hermes/skill-bundles/<slug>.yaml:

backend-dev.yaml
name: backend-dev
description: |
  Full backend feature workflow — code review, TDD, and PR management.
skills:
  - github-code-review
  - test-driven-development
  - github-pr-workflow
instruction: |
  Always write failing tests first before implementation.
  Never push directly to main.

Example bundles: research-session combines arxiv, deep-research, plan, and excalidraw; mlops-deploy combines vllm, llama-cpp, github-pr-workflow, and systematic-debugging. Rules: if a Bundle and a single Skill share a name, the Bundle wins; missing skills are skipped with a warning; Bundles do not modify the system prompt, which keeps Prompt Cache intact. Quick create via CLI:

terminal
hermes bundles create backend-dev \
    --skills github-code-review,test-driven-development,github-pr-workflow \
    --instruction "Always write failing tests first"

Conditional Activation lives under metadata.hermes and auto-shows or hides skills by tool availability:

  • requires_toolsets / requires_tools: Hide the skill when listed toolsets or tools are absent.
  • fallback_for_toolsets / fallback_for_tools: Hide the skill when listed toolsets or tools are present (fallback path).

Classic pattern: after setting FIRECRAWL_KEY or BRAVE_SEARCH_KEY, paid web_search activates and duckduckgo-search disappears because of fallback_for_tools: [web_search]. When the API is unavailable, the fallback resurfaces. Platform-aware skills use requires_toolsets: [messaging] with platforms: [telegram, discord]. The hermes skills TUI lets you toggle skills independently for CLI, Telegram, and Discord.

hermes skills install commands
hermes skills install official/research/arxiv
hermes skills install https://example.com/SKILL.md --name my-skill
hermes skills install github:openai/skills/k8s
hermes skills tap add github:my-org/my-skills
Notable open-source skill repositories (re-check Star counts after publish)
Repository Highlights
ChuckSRQ/awesome-hermes-skills Production-grade collection: Deep Research, MLOps, Apple integration; 23 skills usable in GitHub Copilot
amanning3390/hermeshub Community registry with security scanning and API marketplace support
kevinnft/ai-agent-skills 191 skills across 28 categories; works with Hermes, Claude Code, and Cursor
NousResearch/hermes-agent Official source with all built-in Skills and authoring specs

Publishing a Skill Tap: Create a GitHub repo as the skill source. Team members run hermes skills tap add github:your-org/your-skills-tap to subscribe. Private repos need --token $GH_TOKEN. Run hermes skills tap update to pull changes. Optional skills.sh.json controls Hub category display. Version-control ~/.hermes/skills/ in Git for cross-device sync, then run hermes skills reset to rebuild built-ins. Because agentskills.io is an open standard, validate with skills-ref validate ./my-skill before publishing.

The links below are canonical spec and ecosystem entry points. Re-open them after upstream releases in case fields changed.

Hermes Agent Skills system documentation

Creating Skills developer guide

agentskills.io open standard specification

hermes-agent-self-evolution (GEPA tooling repository)

GEPA (Genetic-Pareto Prompt Evolution) is a 2026 ICLR Oral result integrated in hermes-agent-self-evolution. The loop: analyze execution traces, generate variants, run multi-objective Pareto optimization, and improve the SKILL.md text itself—no model fine-tuning. Each optimization run costs roughly $2–10 in API calls with no GPU required.

Five stages: ① trace collection (full reasoning traces in SQLite) → ② reflective failure analysis (actionable side information) → ③ targeted mutation (10–20 SKILL.md variants per failure) → ④ multi-objective Pareto evaluation (success rate × token efficiency × speed) → ⑤ human PR review (best variant ships).

evolve_skill quick start
export HERMES_AGENT_PATH=~/.hermes
python -m evolution.skills.evolve_skill \
        --skill github-code-review \
        --iterations 10 \
        --eval-source sessiondb

Four safety guardrails: full test suite must pass 100%; Skills capped at 15KB and tool descriptions at 500 characters; Prompt Cache must not break; semantic preservation checks keep the original intent. Official roadmap: Phase 1 Skill files (done) → Phase 2 tool descriptions → Phase 3 system prompt → Phase 4 tool implementation code → Phase 5 fully automated loop. Because Skills follow agentskills.io, feed mixed traces from Claude Code or Gemini CLI: --eval-source mixed --trace-dirs ~/.claude/traces,~/.hermes/sessions.

Plugin-Bundled Skills load under the plugin:skill namespace. They do not appear in the default skills_list and activate only on explicit user invocation. Skills within the same plugin can cross-reference each other. Declare skill paths in the plugin's plugin.yaml.

Engineering tips that separate good Skills from noisy ones:

  • Description drives activation: Write "Use when reviewing a pull request...Do NOT use for writing new code" instead of "Helps with code."
  • Pitfalls are the quality bar: List concrete failure modes, root causes, and fixes (fragile CSS selectors, GitHub API rate limits, large diff token overflow).
  • Script when possible: Reference scripts/extract_schema.py in Procedure; on failure, load references/manual-extract.md.
  • Size discipline: Under 500 lines stays in SKILL.md; 500–1000 lines move detail to references; over 15KB must split (GEPA limit).
  • skill_manage: The agent can action='patch'|'create' to maintain skills dynamically. Set agent_writes_require_approval: true in config.yaml for a human gate.

Blog workflow case: Create a blog-workflow Bundle packing seo-keyword-research, outline-generator, code-example-validator, bilingual-checker, and publish-to-platform. The instruction enforces SEO research first, runnable code examples, and bilingual titles. After editing a Skill, the current session does not pick up changes—run /reset or install with --now (which invalidates Prompt Cache).

FAQ highlights: Skills teach workflow; MCP exposes tool interfaces—they are not interchangeable. GEPA variants pass four guardrails plus human PR review, but you should still diff every change. Copy SKILL.md to ~/.claude/skills/ or install from kevinnft/ai-agent-skills for multi-tool reuse. Write descriptions in English or bilingual English-Chinese for better LLM routing.

  1. Install Hermes and official skills: Run hermes skills install official/research/arxiv and confirm ~/.hermes/skills/ layout.
  2. Author your first SKILL.md: Set required name and description (Use when...). Include Procedure and Pitfalls. Validate with skills-ref validate.
  3. Create a Skill Bundle: Write YAML in ~/.hermes/skill-bundles/ or use hermes bundles create. Test /bundle-name loads all skills together.
  4. Configure conditional activation: Add requires/fallback rules under metadata.hermes. Toggle API keys to verify show/hide logic.
  5. Subscribe to a community Tap: hermes skills tap add github:ChuckSRQ/awesome-hermes-skills, then tap update to stay current.
  6. Publish a team Tap: Create a GitHub repo with skills.sh.json. Members run tap add; private repos need a token.
  7. Run GEPA evolution: Clone hermes-agent-self-evolution, run evolve_skill --eval-source sessiondb, and review the generated PR.
  8. Deploy to an always-on remote Mac: Sync ~/.hermes/ to a dedicated Apple Silicon node so Gateway and skill directories stay online for continuous trace collection.
  • Level 0 metadata overhead: All skills' name+description combined ~3K tokens (Hermes docs; verify against your version).
  • GEPA per-run cost: ~$2–10 in API calls, no GPU (NousResearch self-evolution project).
  • GEPA size guardrails: Skill files ≤15KB, tool descriptions ≤500 characters.
  • Frontmatter limits: name ≤64 chars; description ≤1024 chars; SKILL.md body target ≤500 lines.
  • Cross-platform standard: agentskills.io Skills run on Hermes, Claude Code, Cursor, and OpenCode—reducing vendor lock-in.

Running Hermes Skills and GEPA evolution needs an environment that stays online, writes traces continuously, and exposes native macOS tooling. Common substitutes each fail a requirement: a personal MacBook sleeps on lid-close and interrupts sessiondb collection; a Linux VPS cannot run Xcode or Metal-dependent skill scripts; a shared virtual Mac adds multi-tenant contention that skews evolution benchmarks. For iOS CI/CD, Telegram Gateway, and AI Agent self-evolution in production, NOVAKVM Mac Mini M4/M4 Pro bare-metal cloud rental provides dedicated Apple Silicon, six-region nodes, and elastic day/week/month terms—Skills define how work runs; the remote node keeps it running. See the pricing page and help center.