Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

execkit

Stateful, structured, safe shell sessions for AI agents, on real infrastructure.

When you give an AI agent a raw shell, you get a black box: one-shot commands with no memory between them, a wall of mixed stdout and stderr the agent has to guess its way through, secrets landing in plaintext, no record of what ran, and no way to undo a mistake. execkit replaces that with a session abstraction built for agents.

  • Stateful sessions. cd and environment persist across calls, like a real terminal, instead of every command starting fresh from home.
  • Structured results. Each command returns split stdout/stderr, an exit code, duration, and cwd as data the agent can act on, not a blob to parse.
  • Safe by default. Output is ANSI-stripped, secret-redacted, and bounded so a noisy build cannot blow the agent’s context window or leak credentials.
  • Real transports. Local shell, SSH, and Docker, with host-key verification and a sandboxed key directory.
  • Undo. Git-backed workspace checkpoints let you snapshot before a risky change and restore on demand (files only, not side effects).
  • Observability. An append-only audit log, a live read-only viewer, and live MCP notifications so you can watch what the agent does in the shell.

Two ways to use it

  • As an MCP server (execkit-mcp): a stdio Model Context Protocol server that any MCP-capable agent (Claude Code, Cursor, Gemini CLI, and others) can drive directly. Start at Installation.
  • As a Rust library (execkit): embed sessions in your own program. See the Rust library page. A Python SDK wraps the same core.

A note on safety

The agent driving these tools can be prompt-injected, so execkit treats every tool argument as untrusted. Anything dangerous to the host is controlled by the operator at startup (environment variables), never by a per-call agent argument. The command allow/deny fence is advisory defense-in-depth, not a sandbox: the real boundary is running the agent as a least-privilege user, in a container, or on a scoped SSH account. See the Security model.

execkit is Apache-2.0 licensed. Source: https://github.com/blinkingbit-oss/execkit.

Installation

execkit-mcp ships as a prebuilt binary. Pick whichever fits your toolchain:

pip install execkit-mcp      # a wheel; no Rust toolchain needed
cargo install execkit-mcp    # ...or via cargo

Building from source instead:

cargo build -p execkit-mcp --release   # binary at target/release/execkit-mcp

Verify the install

execkit-mcp --version
execkit-mcp doctor

doctor reports what is configured and what is missing before you ever wire an agent in: whether an audit destination is set and writable, where the SSH key directory and known_hosts resolve to, and whether the Docker daemon is reachable. A typical run:

execkit-mcp 0.8.0
[ -- ] binary: /home/you/.cargo/bin/execkit-mcp

[ -- ] audit: off (set EXECKIT_MCP_AUDIT or EXECKIT_MCP_AUDIT_DIR to record + watch activity)
[ ok ] ssh key dir: /home/you/.ssh (override: EXECKIT_MCP_KEY_DIR)
[ ok ] known_hosts: /home/you/.ssh/known_hosts
[ ok ] docker: daemon reachable

Each [warn] or [ -- ] line tells you what to set. None of these are required to start, they just enable optional features (auditing, SSH, Docker).

Where the binary lives

cargo install puts it at ~/.cargo/bin/execkit-mcp. If that is not on your MCP client’s PATH, use the full path when you register it (the next page shows how, and execkit-mcp setup fills the absolute path in for you).

Next: Wiring into an agent.

Wiring into an agent

execkit-mcp is a stdio MCP server. You register the installed binary with your client once, and the agent gains session_create, session_exec, and the rest.

The fastest path is to let execkit print the exact config with the binary’s absolute path already filled in:

execkit-mcp setup claude     # or: cursor | gemini

It prints a ready-to-use block (and, for Claude Code, the one-line command). It deliberately does not edit your client’s config file for you, so it can never corrupt one; you paste the block into the right place.

Claude Code

One command:

claude mcp add execkit -- execkit-mcp        # add `-s user` to enable it everywhere

Cursor and Gemini CLI

Cursor reads ~/.cursor/mcp.json (or .cursor/mcp.json in a project); Gemini CLI reads ~/.gemini/settings.json. Add the same block to either:

{
  "mcpServers": {
    "execkit": { "command": "execkit-mcp" }
  }
}

If the binary is not on the client’s PATH, use the absolute path that execkit-mcp setup printed.

Turning on operator settings

Anything that affects the host (auditing, SSH key location, session limits) is configured by you, the operator, through environment variables in the client config, not by the agent. Add an env block:

{
  "mcpServers": {
    "execkit": {
      "command": "execkit-mcp",
      "env": { "EXECKIT_MCP_AUDIT": "/var/log/execkit.jsonl" }
    }
  }
}

See the Security model for the full list of settings and why they live with the operator. Once wired, the agent calls session_create -> session_exec -> session_destroy; see Sessions.

Sessions

A session is a live shell that outlives a single tool call. The agent opens one, runs as many commands as it likes against it (state carries over), and closes it.

The tools

ToolArgumentsReturns
session_createtransport ("local" / "ssh" / "docker") plus transport options (see Transports); optional allow / deny lists; optional output_budget{ "session_id": "..." }
session_execsession_id, command, optional budgetstructured ExecResult
session_destroysession_id{ "destroyed": true }

Remote sessions add session_checkpoint, session_checkpoints, and session_restore; see Checkpoints.

State persists

Sessions are stateful. cd, exported variables, and shell state carry across session_exec calls, the way a real terminal works:

// session_exec {"session_id":"1_local","command":"cd /srv/app && export ENV=prod"}
// session_exec {"session_id":"1_local","command":"pwd"}   -> stdout: "/srv/app"
// session_exec {"session_id":"1_local","command":"echo $ENV"} -> stdout: "prod"

This is the difference between a shell and a series of unrelated strangers: an agent that runs cd packages/api and then npm test gets the test run in packages/api, not back in the home directory.

Structured results

session_exec returns an ExecResult as JSON, not a blob:

// session_exec {"session_id":"1_local","command":"npm run build"}
{
  "stdout": "...",
  "stderr": "Error: Cannot find module 'webpack'",
  "exit_code": 1,
  "duration_ms": 3420,
  "cwd": "/home/u/app",
  "truncated": false
}

stdout and stderr are split, so the agent never has to guess whether output was an error. The exit code is authoritative. Output is ANSI-stripped and secret-redacted before it is returned, and bounded so one command cannot flood the agent’s context (see Output budgets).

Session ids are self-identifying

Ids read as <n>_local, <n>_ssh_<user>@<host>[:port], or <n>_docker_<container>, so logs and the watch viewer are legible at a glance. Agent-provided host/user/container names are sanitized before they appear in an id or a filename.

Lifecycle and limits

Sessions are reaped when idle (default 30 minutes) to free the process and a slot against the concurrent-session cap (default 64). Both are operator-tunable; see the Security model. Always session_destroy when done.

Transports

session_create takes a transport: "local", "ssh", or "docker".

Local

A shell on the machine running the server.

// session_create {"transport":"local"}  -> {"session_id":"1_local"}

The agent reaches whatever the server’s user can. Run that user with least privilege.

SSH

// session_create {
//   "transport":"ssh", "host":"web-01", "user":"deploy",
//   "key_path":"deploy_ed25519"            // or "password":"..."
// }

Required: host, user, and one of password or key_path. Optional: port (default 22) and fingerprint to pin an exact host key.

Host-key handling is safe by default:

  • Verified against known_hosts (TOFU). A changed key is rejected as a likely man-in-the-middle. The file is ~/.ssh/known_hosts unless EXECKIT_MCP_KNOWN_HOSTS overrides it, and the first connection records the key.
  • Pin a key by passing fingerprint for an exact match.
  • key_path is sandboxed. It must canonicalize to inside the key directory (~/.ssh by default, or EXECKIT_MCP_KEY_DIR). Out-of-bounds or traversal paths are rejected with a generic error that does not leak whether the path exists.

The home directory behind ~/.ssh resolves by priority ($HOME, then the system passwd database), so defaults are correct even when $HOME is unset, as in a service-launched server.

For throwaway or test hosts only, EXECKIT_MCP_INSECURE_ACCEPT_ANY_HOSTKEY=1 disables host-key verification. Never use it in production.

Docker

// session_create {"transport":"docker","container":"app-web-1"}

Runs docker exec against any container the daemon can see, so the agent reaches whatever your Docker context exposes. Grant the server Docker access only when you want that, and scope the daemon or context accordingly.

Remote workspace undo

SSH and Docker sessions support Checkpoints: a git-backed snapshot of the workspace you can restore on demand.

Output budgets

A noisy command can dump thousands of lines. That hurts twice: it costs the agent context window, and the volume itself degrades the agent’s reasoning. Output budgets shape a command’s output before it reaches the model.

Pass budget to session_exec, or output_budget to session_create for a session default:

// keep only the last 200 lines of a noisy build
{ "session_id": "1_local", "command": "npm run build",
  "budget": { "keep": { "mode": "tail", "n": 200 } } }

// grep a 50k-line log for errors, with 2 lines of context around each
{ "session_id": "1_local", "command": "cat big.log",
  "budget": { "grep": { "pattern": "error|fail", "context": 2 } } }

Keep modes are tail, head, and head_tail; grep is a separate filter; both honor a max_chars cap.

Shaping is line-based, applied client-side, and runs after secret redaction. It never changes the exit code or any side effect of the command, only what text comes back. When a budget is applied, the result carries a budget report so the agent knows the output was shaped:

"budget": {
  "stdout": { "mode": "tail", "lines_total": 4123, "lines_kept": 200 },
  "stderr": { "mode": "tail", "lines_total": 12, "lines_kept": 12 }
}

Use budgets liberally on commands you expect to be loud (builds, installs, big log reads); the agent keeps the signal without the noise.

Checkpoints

On SSH and Docker sessions, execkit can snapshot the workspace before a changing command and restore it on demand: a filesystem “undo” for agent actions.

It undoes files only, never side effects. A dropped database stays dropped, a sent email stays sent, an installed package stays installed. For “the agent mangled my source tree,” that is exactly the recovery you want.

Tools

ToolArgumentsReturns
session_checkpointsession_id, optional label{ "checkpoint_id": "..." }
session_checkpointssession_id[{ id, label, created }]
session_restoresession_id, optional checkpoint_id{ restored_to, files_changed }

Omit checkpoint_id on restore to roll back to the most recent checkpoint.

Enabling it

Two requirements:

  1. git on the remote host. Checkpoints use a shadow git repo. If git is absent, auto-snapshot disables itself and checkpoint calls return a clear “install git on the remote” error.
  2. An explicit workspace on session_create. Without it, checkpoints and auto-snapshot are off. execkit will not default to the cwd or $HOME, snapshotting a home directory is slow and would capture secrets. Set workspace to the project directory you want undo for (pass $HOME explicitly only if you truly mean it).

Control it via session_create:

  • workspace (root; REQUIRED to enable checkpoints)
  • auto_snapshot (default true; effective only with a workspace)
  • paths (sub-directories under the root to track)
  • checkpoint_ignores (extra gitignore-style patterns, added to the built-in defaults: .git, node_modules, build dirs, caches, .ssh, .aws, …)

Restore is destructive

Warning. session_restore reverts tracked files and deletes all untracked files and directories anywhere under the workspace (via git clean), not only files created since the checkpoint. Do not restore if untracked files in the workspace must be preserved.

Security model

The agent driving these tools can be prompt-injected, so execkit treats every tool argument as untrusted. Anything dangerous to the host or filesystem is controlled by the operator at startup through environment variables, never by a per-call agent argument. An injected agent cannot change where the audit log is written, which directory SSH keys come from, or the session limits.

Operator settings

Env varPurposeDefault
EXECKIT_MCP_AUDITappend a JSONL audit log of every command hereoff
EXECKIT_MCP_AUDIT_DIRone JSONL file per session in this directory (<session_id>-<open_ms>.jsonl); takes precedence over EXECKIT_MCP_AUDIToff
EXECKIT_MCP_AUDIT_RETENTION_DAYSdelete per-session log files older than N days at startup (dir mode only); 0 disables14
EXECKIT_MCP_KEY_DIRSSH key_path must canonicalize to inside this dir~/.ssh
EXECKIT_MCP_KNOWN_HOSTSSSH host-key verification file (TOFU; rejects changed keys)~/.ssh/known_hosts
EXECKIT_MCP_INSECURE_ACCEPT_ANY_HOSTKEYDANGEROUS disable host-key checksunset
EXECKIT_MCP_MAX_SESSIONSsoft cap on concurrent live sessions64
EXECKIT_MCP_SESSION_TTLreap sessions idle longer than N seconds; 0 disables1800
EXECKIT_MCP_POLICY_FILEJSON allow/deny (program names) + deny_patterns (regex) the agent cannot edit; advisoryoff

EXECKIT_MCP_KEY_DIR and EXECKIT_MCP_KNOWN_HOSTS default off the home directory, which resolves by priority ($HOME, then the passwd database), so the defaults are correct even when $HOME is unset. Run execkit-mcp doctor to see what each one resolves to on your machine.

What is enforced where

  • Host keys are verified by default (TOFU against known_hosts; a changed key is rejected as a likely MITM). Pin an exact key with fingerprint, or set the insecure env var only for throwaway hosts.
  • key_path is sandboxed to EXECKIT_MCP_KEY_DIR; traversal or out-of-bounds paths are rejected with a generic error that does not leak path existence.
  • The audit destination is operator-chosen, never a tool argument, so an injected agent cannot write to arbitrary files.
  • Docker sessions reach any container the daemon can see. Grant Docker access deliberately and scope the context.
  • The server speaks MCP on stdout; all diagnostics go to stderr.

The fence is advisory, not a sandbox

allow / deny command lists are defense in depth, not a jail. Matching on command strings is trivially bypassable (/bin/rm, $(echo rm), base64, bash -c "..."). Treat the fence as a guardrail against accidents and obvious mistakes.

The real security boundary is the operating system: run the agent’s shell as a least-privilege user, in a container, or on a scoped SSH account, so that even a fully compromised agent can only reach what that account can. execkit gives you visibility and undo on top of that boundary; it does not replace it.

Operator command policy

Point EXECKIT_MCP_POLICY_FILE at a JSON file to set an allow/deny fence the agent cannot edit (unlike the per-call allow/deny, which the agent supplies):

{
  "allow": ["git", "ls", "npm"],
  "deny": ["rm", "dd", "shutdown"],
  "deny_patterns": ["\\brm\\b", "kubectl\\s+delete", "git\\s+push\\s+.*--force"]
}
  • allow (program names): if non-empty, only these may run. Empty/absent = all.
  • deny (program names): always blocked; deny wins over allow.
  • deny_patterns (regex over the whole command): for what names cannot express.

Prefer a deny_pattern over a name deny for anything that matters: name matching only sees the program name per pipeline segment, so deny: ["rm"] misses sudo rm and xargs rm, while deny_patterns: ["\\brm\\b"] catches them. In JSON the regex backslashes double up (\\b); use (?i) for case-insensitive matching.

A blocked command never runs; it is recorded in the audit log, shown in watch, and pushed to the client as a warning. This is an ADVISORY guardrail, not a sandbox: string matching is trivially bypassable (/bin/rm, base64, bash -c). The real boundary is a least-privilege user, a container, or a scoped SSH account.

Auditing and the watch viewer

execkit can record everything an agent does in the shell, and let you watch it live.

The audit log

Point the server at a destination and every open / exec / close event is appended as JSON, with the session id, transport, an epoch-millisecond timestamp, and the command plus its (redacted, bounded) output.

  • EXECKIT_MCP_AUDIT=/var/log/execkit.jsonl writes one shared file for all sessions.
  • EXECKIT_MCP_AUDIT_DIR=/var/log/execkit/ writes one file per session, named <session_id>-<open_ms>.jsonl. This mode takes precedence when both are set.
  • EXECKIT_MCP_AUDIT_RETENTION_DAYS (default 14, 0 disables) prunes per-session files older than N days at startup. Files with a future-skewed mtime are never deleted.

The audit destination is operator-chosen and never a tool argument, so an injected agent cannot redirect or suppress it.

The watch viewer

Point watch at the audit file or directory from another terminal:

execkit-mcp watch /var/log/execkit.jsonl   # or: execkit-mcp watch  (uses $EXECKIT_MCP_AUDIT)
execkit-mcp watch /var/log/execkit/        # a directory; uses $EXECKIT_MCP_AUDIT_DIR

It is a live, read-only TUI: the agent’s sessions on the left, the selected session’s shell transcript on the right (prompt, command, stdout, stderr in red, exit status), rendered like a normal shell rather than JSON. Switch sessions with 1-9 or the arrow keys, scroll with PgUp/PgDn, quit with q. It only ever reads the log. Because the data comes from the server, it works the same under any MCP client.

Browser viewer

For a richer view in a normal browser tab, serve the transcript as a local web page:

execkit-mcp watch --serve /var/log/execkit/        # prints a loopback URL with a token
execkit-mcp watch --serve --open /var/log/execkit/ # ...and open it in your browser

The MCP server also starts the viewer automatically when EXECKIT_MCP_WATCH_WEB is set: it binds 127.0.0.1 only, prints the tokened URL and pushes it to the client as a notification, and keeps the URL stable across restarts so an open tab reconnects. EXECKIT_MCP_WATCH_PORT sets the port (default 7878, falls back to a random one if taken) and EXECKIT_MCP_WATCH_OPEN also opens the browser for you.

The page is read-only and local by construction: it binds loopback only, every request needs the URL token, and it can read the audit stream but never touch a session, a command, or your files. What you get:

  • Sidebar grouped by transport (local / ssh / docker); a group header shows its session count, a session row shows its command count, and the active session is highlighted.
  • Colored transcript with a header legend (cmd / out / err / ok). Click a legend item to show or hide that line type.
  • Search: press / to find within the transcript, step matches with Enter / Shift+Enter, and press e to jump to the next error or blocked line.
  • History of past sessions (newest first, with relative times) when EXECKIT_MCP_AUDIT_DIR is set; click one to read its transcript. With a single EXECKIT_MCP_AUDIT file there is no per-session history.
  • Per-session actions from a 3-dots menu: rename (a display alias), pin, keep, export to .txt / .log / .md / .json, and screenshot to .png.
  • A status bar that shows the selected session’s details; click it to copy the session id.

Rename / pin / keep and the sidebar width persist in ~/.execkit/viewer-state.json (mode 0600). That file is the viewer’s only write surface: display metadata only, never able to affect a session, a command, or the audit log.

Headless follow mode

For a pipeable, no-TTY view, use --follow instead of the TUI. It prints each command and its output as a line prefixed with the session id, as it happens:

execkit-mcp watch --follow /var/log/execkit/
# [1_local] /home/u $ npm run build
# [1_local] x exit 1  (3420ms)
# [2_ssh_deploy@web-01] /srv $ systemctl restart app

Live notifications in the client

Even with no audit log configured, the server streams each command to the MCP client as it runs, so a host agent can surface its own shell activity without anyone opening a separate terminal. Every session_exec emits:

  • a log notification (notifications/message) carrying the full shell transcript, info on success and warning on a non-zero exit; and
  • a progress notification (notifications/progress) with a one-line summary, when the call supplied a progressToken.

This reveals nothing new: the client already receives the same output in the tool result, redacted and bounded. How the activity is surfaced is up to the client.

CLI reference

execkit-mcp with no arguments is the stdio MCP server an agent launches. Everything below is for a human at a terminal.

Commands

execkit-mcp                          Run the MCP server on stdio (default)
execkit-mcp setup <client>           Print the config to wire execkit into a client
                                     client: claude | cursor | gemini
execkit-mcp doctor                   Check the local environment and print a report
execkit-mcp watch [--follow|--serve] [--open] <path>
                                     Live, read-only viewer (TUI, follow stream, or browser)
execkit-mcp --version                Print version
execkit-mcp --help                   Print help

setup <client>

Prints a ready MCP config block with this binary’s absolute path filled in, and for Claude Code the claude mcp add one-liner. It prints rather than edits your client’s live config, so it cannot corrupt one. See Wiring into an agent.

doctor

Reports the resolved audit destination and its writability, the SSH key directory and known_hosts (with the env var that overrides each), and whether the Docker daemon is reachable. Use it after install to catch setup problems before an agent connects. See Installation.

watch [--follow|--serve] [--open] <path>

A live read-only viewer over the audit log; a file or a directory. --follow gives a headless, pipeable stream instead of the TUI; --serve serves a loopback, token-gated web page instead, and --open also launches your browser at it. See Auditing and the watch viewer.

Environment

These configure the server (operator-controlled, not agent arguments). Full table and rationale on the Security model page.

EXECKIT_MCP_AUDIT                  Append a JSONL audit log of every command here
EXECKIT_MCP_AUDIT_DIR             One JSONL file per session in this directory
EXECKIT_MCP_AUDIT_RETENTION_DAYS  Prune per-session files older than N days (default 14)
EXECKIT_MCP_WATCH_WEB             Auto-start the loopback browser viewer and surface its URL
EXECKIT_MCP_WATCH_PORT            Port for the browser viewer (default 7878; random if taken)
EXECKIT_MCP_WATCH_OPEN            Also auto-open the browser at the viewer URL (default: link only)
EXECKIT_MCP_KEY_DIR               Directory SSH keys must live under (default ~/.ssh)
EXECKIT_MCP_KNOWN_HOSTS           SSH known_hosts file (default ~/.ssh/known_hosts)
EXECKIT_MCP_MAX_SESSIONS          Soft cap on concurrent live sessions (default 64)
EXECKIT_MCP_SESSION_TTL           Reap sessions idle longer than N seconds (default 1800)
EXECKIT_MCP_POLICY_FILE           JSON allow/deny + deny_patterns the agent cannot edit (advisory)

Rust library

The execkit crate is the core. The MCP server is a thin wrapper over it; you can embed the same sessions directly in your own program.

[dependencies]
execkit = "0.7"                                          # local + SSH + Docker
# execkit = { version = "0.7", default-features = false } # local + Docker only (no SSH; drops russh/tokio)
use execkit::{Session, Policy};

fn main() -> Result<(), execkit::Error> {
    let mut s = Session::local()?
        .with_policy(Policy { allow: vec![], deny: vec!["rm".into()] });

    let r = s.exec("echo hi; echo err 1>&2; cd /tmp")?;
    // r.stdout == "hi", r.stderr == "err", r.exit_code == 0, r.cwd == "/tmp"
    println!("{} (exit {})", r.stdout, r.exit_code);
    Ok(())
}

State persists across exec calls on the same Session, exactly as it does over MCP. Results are the same structured ExecResult (split stdout/stderr, exit code, duration, cwd), ANSI-stripped and secret-redacted.

SSH and Docker sessions are constructed with their configs:

#![allow(unused)]
fn main() {
use execkit::{Session, SshConfig, SshAuth, HostKeyVerification};

let cfg = SshConfig::new("web-01".into(), "deploy".into(),
    SshAuth::Password("...".into()),
    HostKeyVerification::KnownHosts("/home/me/.ssh/known_hosts".into()));
let mut s = Session::ssh(cfg)?;
}

The API surface stays small; the richness lives in the result, not the verbs:

Session::local() / ::ssh(cfg) / ::docker(container)        -> Session
session.exec(command)                                      -> ExecResult
session.exec_budgeted(command, &budget)                    -> ExecResult
session.checkpoint(label?) / restore(id) / restore_last()  -> CheckpointId / restore report

Runnable examples live in the repository:

cargo run --example local
EXECKIT_SSH="user:password@host:22" cargo run --example ssh

Full API docs are on docs.rs/execkit.

Python SDK

execkit-py wraps the same Rust core with a Python API, published to PyPI as execkit.

pip install execkit
from execkit import Session, Policy

s = Session.local(policy=Policy(allow=[], deny=["rm"]))
r = s.exec("echo hi; echo err 1>&2; cd /tmp")
print(r.stdout, r.exit_code, r.cwd)   # "hi" 0 "/tmp"
s.close()

The result object mirrors the Rust ExecResult: split stdout / stderr, exit_code, duration_ms, cwd, and truncated, already ANSI-stripped and secret-redacted. State persists across exec calls on the same session.

SSH and Docker sessions work through Session.ssh(...) / Session.docker(...) with the same options as Transports. Output budgets are keyword arguments (tail, head, grep, max_chars) on the session constructors and on exec. Checkpoints are not exposed in the Python SDK yet; use the Rust library or the MCP server for those.

Wheels ship for Linux and macOS, so no Rust toolchain is needed to install.