execkit
Stateful, structured, safe shell sessions for AI agents, on real infrastructure.
When you give an AI agent a raw shell, you get a black box: one-shot commands with no memory between them, a wall of mixed stdout and stderr the agent has to guess its way through, secrets landing in plaintext, no record of what ran, and no way to undo a mistake. execkit replaces that with a session abstraction built for agents.
- Stateful sessions.
cdand environment persist across calls, like a real terminal, instead of every command starting fresh from home. - Structured results. Each command returns split stdout/stderr, an exit code, duration, and cwd as data the agent can act on, not a blob to parse.
- Safe by default. Output is ANSI-stripped, secret-redacted, and bounded so a noisy build cannot blow the agent’s context window or leak credentials.
- Real transports. Local shell, SSH, and Docker, with host-key verification and a sandboxed key directory.
- Undo. Git-backed workspace checkpoints let you snapshot before a risky change and restore on demand (files only, not side effects).
- Observability. An append-only audit log, a live read-only viewer, and live MCP notifications so you can watch what the agent does in the shell.
Two ways to use it
- As an MCP server (
execkit-mcp): a stdio Model Context Protocol server that any MCP-capable agent (Claude Code, Cursor, Gemini CLI, and others) can drive directly. Start at Installation. - As a Rust library (
execkit): embed sessions in your own program. See the Rust library page. A Python SDK wraps the same core.
A note on safety
The agent driving these tools can be prompt-injected, so execkit treats every tool argument as untrusted. Anything dangerous to the host is controlled by the operator at startup (environment variables), never by a per-call agent argument. The command allow/deny fence is advisory defense-in-depth, not a sandbox: the real boundary is running the agent as a least-privilege user, in a container, or on a scoped SSH account. See the Security model.
execkit is Apache-2.0 licensed. Source: https://github.com/blinkingbit-oss/execkit.
Installation
execkit-mcp ships as a prebuilt binary. Pick whichever fits your toolchain:
pip install execkit-mcp # a wheel; no Rust toolchain needed
cargo install execkit-mcp # ...or via cargo
Building from source instead:
cargo build -p execkit-mcp --release # binary at target/release/execkit-mcp
Verify the install
execkit-mcp --version
execkit-mcp doctor
doctor reports what is configured and what is missing before you ever wire an
agent in: whether an audit destination is set and writable, where the SSH key
directory and known_hosts resolve to, and whether the Docker daemon is
reachable. A typical run:
execkit-mcp 0.8.0
[ -- ] binary: /home/you/.cargo/bin/execkit-mcp
[ -- ] audit: off (set EXECKIT_MCP_AUDIT or EXECKIT_MCP_AUDIT_DIR to record + watch activity)
[ ok ] ssh key dir: /home/you/.ssh (override: EXECKIT_MCP_KEY_DIR)
[ ok ] known_hosts: /home/you/.ssh/known_hosts
[ ok ] docker: daemon reachable
Each [warn] or [ -- ] line tells you what to set. None of these are required
to start, they just enable optional features (auditing, SSH, Docker).
Where the binary lives
cargo install puts it at ~/.cargo/bin/execkit-mcp. If that is not on your MCP
client’s PATH, use the full path when you register it (the next page shows how,
and execkit-mcp setup fills the absolute path in for you).
Next: Wiring into an agent.
Wiring into an agent
execkit-mcp is a stdio MCP server. You register the installed binary with your
client once, and the agent gains session_create, session_exec, and the rest.
The fastest path is to let execkit print the exact config with the binary’s absolute path already filled in:
execkit-mcp setup claude # or: cursor | gemini
It prints a ready-to-use block (and, for Claude Code, the one-line command). It deliberately does not edit your client’s config file for you, so it can never corrupt one; you paste the block into the right place.
Claude Code
One command:
claude mcp add execkit -- execkit-mcp # add `-s user` to enable it everywhere
Cursor and Gemini CLI
Cursor reads ~/.cursor/mcp.json (or .cursor/mcp.json in a project); Gemini CLI
reads ~/.gemini/settings.json. Add the same block to either:
{
"mcpServers": {
"execkit": { "command": "execkit-mcp" }
}
}
If the binary is not on the client’s PATH, use the absolute path that
execkit-mcp setup printed.
Turning on operator settings
Anything that affects the host (auditing, SSH key location, session limits) is
configured by you, the operator, through environment variables in the client
config, not by the agent. Add an env block:
{
"mcpServers": {
"execkit": {
"command": "execkit-mcp",
"env": { "EXECKIT_MCP_AUDIT": "/var/log/execkit.jsonl" }
}
}
}
See the Security model for the full list of settings and
why they live with the operator. Once wired, the agent calls session_create ->
session_exec -> session_destroy; see Sessions.
Sessions
A session is a live shell that outlives a single tool call. The agent opens one, runs as many commands as it likes against it (state carries over), and closes it.
The tools
| Tool | Arguments | Returns |
|---|---|---|
session_create | transport ("local" / "ssh" / "docker") plus transport options (see Transports); optional allow / deny lists; optional output_budget | { "session_id": "..." } |
session_exec | session_id, command, optional budget | structured ExecResult |
session_destroy | session_id | { "destroyed": true } |
Remote sessions add session_checkpoint, session_checkpoints, and
session_restore; see Checkpoints.
State persists
Sessions are stateful. cd, exported variables, and shell state carry across
session_exec calls, the way a real terminal works:
// session_exec {"session_id":"1_local","command":"cd /srv/app && export ENV=prod"}
// session_exec {"session_id":"1_local","command":"pwd"} -> stdout: "/srv/app"
// session_exec {"session_id":"1_local","command":"echo $ENV"} -> stdout: "prod"
This is the difference between a shell and a series of unrelated strangers: an
agent that runs cd packages/api and then npm test gets the test run in
packages/api, not back in the home directory.
Structured results
session_exec returns an ExecResult as JSON, not a blob:
// session_exec {"session_id":"1_local","command":"npm run build"}
{
"stdout": "...",
"stderr": "Error: Cannot find module 'webpack'",
"exit_code": 1,
"duration_ms": 3420,
"cwd": "/home/u/app",
"truncated": false
}
stdout and stderr are split, so the agent never has to guess whether output was an error. The exit code is authoritative. Output is ANSI-stripped and secret-redacted before it is returned, and bounded so one command cannot flood the agent’s context (see Output budgets).
Session ids are self-identifying
Ids read as <n>_local, <n>_ssh_<user>@<host>[:port], or
<n>_docker_<container>, so logs and the watch viewer
are legible at a glance. Agent-provided host/user/container names are sanitized
before they appear in an id or a filename.
Lifecycle and limits
Sessions are reaped when idle (default 30 minutes) to free the process and a slot
against the concurrent-session cap (default 64). Both are operator-tunable; see
the Security model. Always session_destroy when done.
Transports
session_create takes a transport: "local", "ssh", or "docker".
Local
A shell on the machine running the server.
// session_create {"transport":"local"} -> {"session_id":"1_local"}
The agent reaches whatever the server’s user can. Run that user with least privilege.
SSH
// session_create {
// "transport":"ssh", "host":"web-01", "user":"deploy",
// "key_path":"deploy_ed25519" // or "password":"..."
// }
Required: host, user, and one of password or key_path. Optional: port
(default 22) and fingerprint to pin an exact host key.
Host-key handling is safe by default:
- Verified against
known_hosts(TOFU). A changed key is rejected as a likely man-in-the-middle. The file is~/.ssh/known_hostsunlessEXECKIT_MCP_KNOWN_HOSTSoverrides it, and the first connection records the key. - Pin a key by passing
fingerprintfor an exact match. key_pathis sandboxed. It must canonicalize to inside the key directory (~/.sshby default, orEXECKIT_MCP_KEY_DIR). Out-of-bounds or traversal paths are rejected with a generic error that does not leak whether the path exists.
The home directory behind ~/.ssh resolves by priority ($HOME, then the
system passwd database), so defaults are correct even when $HOME is unset, as
in a service-launched server.
For throwaway or test hosts only, EXECKIT_MCP_INSECURE_ACCEPT_ANY_HOSTKEY=1
disables host-key verification. Never use it in production.
Docker
// session_create {"transport":"docker","container":"app-web-1"}
Runs docker exec against any container the daemon can see, so the agent reaches
whatever your Docker context exposes. Grant the server Docker access only when you
want that, and scope the daemon or context accordingly.
Remote workspace undo
SSH and Docker sessions support Checkpoints: a git-backed snapshot of the workspace you can restore on demand.
Output budgets
A noisy command can dump thousands of lines. That hurts twice: it costs the agent context window, and the volume itself degrades the agent’s reasoning. Output budgets shape a command’s output before it reaches the model.
Pass budget to session_exec, or output_budget to session_create for a
session default:
// keep only the last 200 lines of a noisy build
{ "session_id": "1_local", "command": "npm run build",
"budget": { "keep": { "mode": "tail", "n": 200 } } }
// grep a 50k-line log for errors, with 2 lines of context around each
{ "session_id": "1_local", "command": "cat big.log",
"budget": { "grep": { "pattern": "error|fail", "context": 2 } } }
Keep modes are tail, head, and head_tail; grep is a separate filter; both honor a max_chars cap.
Shaping is line-based, applied client-side, and runs after secret redaction.
It never changes the exit code or any side effect of the command, only what text
comes back. When a budget is applied, the result carries a budget report so the
agent knows the output was shaped:
"budget": {
"stdout": { "mode": "tail", "lines_total": 4123, "lines_kept": 200 },
"stderr": { "mode": "tail", "lines_total": 12, "lines_kept": 12 }
}
Use budgets liberally on commands you expect to be loud (builds, installs, big log reads); the agent keeps the signal without the noise.
Checkpoints
On SSH and Docker sessions, execkit can snapshot the workspace before a changing command and restore it on demand: a filesystem “undo” for agent actions.
It undoes files only, never side effects. A dropped database stays dropped, a sent email stays sent, an installed package stays installed. For “the agent mangled my source tree,” that is exactly the recovery you want.
Tools
| Tool | Arguments | Returns |
|---|---|---|
session_checkpoint | session_id, optional label | { "checkpoint_id": "..." } |
session_checkpoints | session_id | [{ id, label, created }] |
session_restore | session_id, optional checkpoint_id | { restored_to, files_changed } |
Omit checkpoint_id on restore to roll back to the most recent checkpoint.
Enabling it
Two requirements:
giton the remote host. Checkpoints use a shadow git repo. If git is absent, auto-snapshot disables itself and checkpoint calls return a clear “install git on the remote” error.- An explicit
workspaceonsession_create. Without it, checkpoints and auto-snapshot are off. execkit will not default to the cwd or$HOME, snapshotting a home directory is slow and would capture secrets. Setworkspaceto the project directory you want undo for (pass$HOMEexplicitly only if you truly mean it).
Control it via session_create:
workspace(root; REQUIRED to enable checkpoints)auto_snapshot(default true; effective only with a workspace)paths(sub-directories under the root to track)checkpoint_ignores(extra gitignore-style patterns, added to the built-in defaults:.git,node_modules, build dirs, caches,.ssh,.aws, …)
Restore is destructive
Warning.
session_restorereverts tracked files and deletes all untracked files and directories anywhere under the workspace (viagit clean), not only files created since the checkpoint. Do not restore if untracked files in the workspace must be preserved.
Security model
The agent driving these tools can be prompt-injected, so execkit treats every tool argument as untrusted. Anything dangerous to the host or filesystem is controlled by the operator at startup through environment variables, never by a per-call agent argument. An injected agent cannot change where the audit log is written, which directory SSH keys come from, or the session limits.
Operator settings
| Env var | Purpose | Default |
|---|---|---|
EXECKIT_MCP_AUDIT | append a JSONL audit log of every command here | off |
EXECKIT_MCP_AUDIT_DIR | one JSONL file per session in this directory (<session_id>-<open_ms>.jsonl); takes precedence over EXECKIT_MCP_AUDIT | off |
EXECKIT_MCP_AUDIT_RETENTION_DAYS | delete per-session log files older than N days at startup (dir mode only); 0 disables | 14 |
EXECKIT_MCP_KEY_DIR | SSH key_path must canonicalize to inside this dir | ~/.ssh |
EXECKIT_MCP_KNOWN_HOSTS | SSH host-key verification file (TOFU; rejects changed keys) | ~/.ssh/known_hosts |
EXECKIT_MCP_INSECURE_ACCEPT_ANY_HOSTKEY | DANGEROUS disable host-key checks | unset |
EXECKIT_MCP_MAX_SESSIONS | soft cap on concurrent live sessions | 64 |
EXECKIT_MCP_SESSION_TTL | reap sessions idle longer than N seconds; 0 disables | 1800 |
EXECKIT_MCP_POLICY_FILE | JSON allow/deny (program names) + deny_patterns (regex) the agent cannot edit; advisory | off |
EXECKIT_MCP_KEY_DIR and EXECKIT_MCP_KNOWN_HOSTS default off the home
directory, which resolves by priority ($HOME, then the passwd database), so the
defaults are correct even when $HOME is unset. Run execkit-mcp doctor to see
what each one resolves to on your machine.
What is enforced where
- Host keys are verified by default (TOFU against
known_hosts; a changed key is rejected as a likely MITM). Pin an exact key withfingerprint, or set the insecure env var only for throwaway hosts. key_pathis sandboxed toEXECKIT_MCP_KEY_DIR; traversal or out-of-bounds paths are rejected with a generic error that does not leak path existence.- The audit destination is operator-chosen, never a tool argument, so an injected agent cannot write to arbitrary files.
- Docker sessions reach any container the daemon can see. Grant Docker access deliberately and scope the context.
- The server speaks MCP on stdout; all diagnostics go to stderr.
The fence is advisory, not a sandbox
allow / deny command lists are defense in depth, not a jail. Matching on
command strings is trivially bypassable (/bin/rm, $(echo rm), base64,
bash -c "..."). Treat the fence as a guardrail against accidents and obvious
mistakes.
The real security boundary is the operating system: run the agent’s shell as a least-privilege user, in a container, or on a scoped SSH account, so that even a fully compromised agent can only reach what that account can. execkit gives you visibility and undo on top of that boundary; it does not replace it.
Operator command policy
Point EXECKIT_MCP_POLICY_FILE at a JSON file to set an allow/deny fence the
agent cannot edit (unlike the per-call allow/deny, which the agent supplies):
{
"allow": ["git", "ls", "npm"],
"deny": ["rm", "dd", "shutdown"],
"deny_patterns": ["\\brm\\b", "kubectl\\s+delete", "git\\s+push\\s+.*--force"]
}
allow(program names): if non-empty, only these may run. Empty/absent = all.deny(program names): always blocked; deny wins over allow.deny_patterns(regex over the whole command): for what names cannot express.
Prefer a deny_pattern over a name deny for anything that matters: name
matching only sees the program name per pipeline segment, so deny: ["rm"] misses
sudo rm and xargs rm, while deny_patterns: ["\\brm\\b"] catches them. In JSON
the regex backslashes double up (\\b); use (?i) for case-insensitive matching.
A blocked command never runs; it is recorded in the audit log, shown in watch,
and pushed to the client as a warning. This is an ADVISORY guardrail, not a
sandbox: string matching is trivially bypassable (/bin/rm, base64, bash -c).
The real boundary is a least-privilege user, a container, or a scoped SSH account.
Auditing and the watch viewer
execkit can record everything an agent does in the shell, and let you watch it live.
The audit log
Point the server at a destination and every open / exec / close event is
appended as JSON, with the session id, transport, an epoch-millisecond timestamp,
and the command plus its (redacted, bounded) output.
EXECKIT_MCP_AUDIT=/var/log/execkit.jsonlwrites one shared file for all sessions.EXECKIT_MCP_AUDIT_DIR=/var/log/execkit/writes one file per session, named<session_id>-<open_ms>.jsonl. This mode takes precedence when both are set.EXECKIT_MCP_AUDIT_RETENTION_DAYS(default 14,0disables) prunes per-session files older than N days at startup. Files with a future-skewed mtime are never deleted.
The audit destination is operator-chosen and never a tool argument, so an injected agent cannot redirect or suppress it.
The watch viewer
Point watch at the audit file or directory from another terminal:
execkit-mcp watch /var/log/execkit.jsonl # or: execkit-mcp watch (uses $EXECKIT_MCP_AUDIT)
execkit-mcp watch /var/log/execkit/ # a directory; uses $EXECKIT_MCP_AUDIT_DIR
It is a live, read-only TUI: the agent’s sessions on the left, the selected
session’s shell transcript on the right (prompt, command, stdout, stderr in red,
exit status), rendered like a normal shell rather than JSON. Switch sessions with
1-9 or the arrow keys, scroll with PgUp/PgDn, quit with q. It only ever
reads the log. Because the data comes from the server, it works the same under
any MCP client.
Browser viewer
For a richer view in a normal browser tab, serve the transcript as a local web page:
execkit-mcp watch --serve /var/log/execkit/ # prints a loopback URL with a token
execkit-mcp watch --serve --open /var/log/execkit/ # ...and open it in your browser
The MCP server also starts the viewer automatically when EXECKIT_MCP_WATCH_WEB
is set: it binds 127.0.0.1 only, prints the tokened URL and pushes it to the
client as a notification, and keeps the URL stable across restarts so an open tab
reconnects. EXECKIT_MCP_WATCH_PORT sets the port (default 7878, falls back to
a random one if taken) and EXECKIT_MCP_WATCH_OPEN also opens the browser for
you.
The page is read-only and local by construction: it binds loopback only, every request needs the URL token, and it can read the audit stream but never touch a session, a command, or your files. What you get:
- Sidebar grouped by transport (
local/ssh/docker); a group header shows its session count, a session row shows its command count, and the active session is highlighted. - Colored transcript with a header legend (
cmd/out/err/ok). Click a legend item to show or hide that line type. - Search: press
/to find within the transcript, step matches withEnter/Shift+Enter, and presseto jump to the next error or blocked line. - History of past sessions (newest first, with relative times) when
EXECKIT_MCP_AUDIT_DIRis set; click one to read its transcript. With a singleEXECKIT_MCP_AUDITfile there is no per-session history. - Per-session actions from a 3-dots menu: rename (a display alias), pin, keep,
export to
.txt/.log/.md/.json, and screenshot to.png. - A status bar that shows the selected session’s details; click it to copy the session id.
Rename / pin / keep and the sidebar width persist in ~/.execkit/viewer-state.json
(mode 0600). That file is the viewer’s only write surface: display metadata
only, never able to affect a session, a command, or the audit log.
Headless follow mode
For a pipeable, no-TTY view, use --follow instead of the TUI. It prints each
command and its output as a line prefixed with the session id, as it happens:
execkit-mcp watch --follow /var/log/execkit/
# [1_local] /home/u $ npm run build
# [1_local] x exit 1 (3420ms)
# [2_ssh_deploy@web-01] /srv $ systemctl restart app
Live notifications in the client
Even with no audit log configured, the server streams each command to the MCP
client as it runs, so a host agent can surface its own shell activity without
anyone opening a separate terminal. Every session_exec emits:
- a log notification (
notifications/message) carrying the full shell transcript,infoon success andwarningon a non-zero exit; and - a progress notification (
notifications/progress) with a one-line summary, when the call supplied aprogressToken.
This reveals nothing new: the client already receives the same output in the tool result, redacted and bounded. How the activity is surfaced is up to the client.
CLI reference
execkit-mcp with no arguments is the stdio MCP server an agent launches.
Everything below is for a human at a terminal.
Commands
execkit-mcp Run the MCP server on stdio (default)
execkit-mcp setup <client> Print the config to wire execkit into a client
client: claude | cursor | gemini
execkit-mcp doctor Check the local environment and print a report
execkit-mcp watch [--follow|--serve] [--open] <path>
Live, read-only viewer (TUI, follow stream, or browser)
execkit-mcp --version Print version
execkit-mcp --help Print help
setup <client>
Prints a ready MCP config block with this binary’s absolute path filled in, and
for Claude Code the claude mcp add one-liner. It prints rather than edits your
client’s live config, so it cannot corrupt one. See
Wiring into an agent.
doctor
Reports the resolved audit destination and its writability, the SSH key directory
and known_hosts (with the env var that overrides each), and whether the Docker
daemon is reachable. Use it after install to catch setup problems before an agent
connects. See Installation.
watch [--follow|--serve] [--open] <path>
A live read-only viewer over the audit log; a file or a directory. --follow
gives a headless, pipeable stream instead of the TUI; --serve serves a loopback,
token-gated web page instead, and --open also launches your browser at it. See
Auditing and the watch viewer.
Environment
These configure the server (operator-controlled, not agent arguments). Full table and rationale on the Security model page.
EXECKIT_MCP_AUDIT Append a JSONL audit log of every command here
EXECKIT_MCP_AUDIT_DIR One JSONL file per session in this directory
EXECKIT_MCP_AUDIT_RETENTION_DAYS Prune per-session files older than N days (default 14)
EXECKIT_MCP_WATCH_WEB Auto-start the loopback browser viewer and surface its URL
EXECKIT_MCP_WATCH_PORT Port for the browser viewer (default 7878; random if taken)
EXECKIT_MCP_WATCH_OPEN Also auto-open the browser at the viewer URL (default: link only)
EXECKIT_MCP_KEY_DIR Directory SSH keys must live under (default ~/.ssh)
EXECKIT_MCP_KNOWN_HOSTS SSH known_hosts file (default ~/.ssh/known_hosts)
EXECKIT_MCP_MAX_SESSIONS Soft cap on concurrent live sessions (default 64)
EXECKIT_MCP_SESSION_TTL Reap sessions idle longer than N seconds (default 1800)
EXECKIT_MCP_POLICY_FILE JSON allow/deny + deny_patterns the agent cannot edit (advisory)
Rust library
The execkit crate is the core. The MCP server is a thin wrapper over it; you can
embed the same sessions directly in your own program.
[dependencies]
execkit = "0.7" # local + SSH + Docker
# execkit = { version = "0.7", default-features = false } # local + Docker only (no SSH; drops russh/tokio)
use execkit::{Session, Policy};
fn main() -> Result<(), execkit::Error> {
let mut s = Session::local()?
.with_policy(Policy { allow: vec![], deny: vec!["rm".into()] });
let r = s.exec("echo hi; echo err 1>&2; cd /tmp")?;
// r.stdout == "hi", r.stderr == "err", r.exit_code == 0, r.cwd == "/tmp"
println!("{} (exit {})", r.stdout, r.exit_code);
Ok(())
}
State persists across exec calls on the same Session, exactly as it does over
MCP. Results are the same structured ExecResult (split stdout/stderr, exit code,
duration, cwd), ANSI-stripped and secret-redacted.
SSH and Docker sessions are constructed with their configs:
#![allow(unused)]
fn main() {
use execkit::{Session, SshConfig, SshAuth, HostKeyVerification};
let cfg = SshConfig::new("web-01".into(), "deploy".into(),
SshAuth::Password("...".into()),
HostKeyVerification::KnownHosts("/home/me/.ssh/known_hosts".into()));
let mut s = Session::ssh(cfg)?;
}
The API surface stays small; the richness lives in the result, not the verbs:
Session::local() / ::ssh(cfg) / ::docker(container) -> Session
session.exec(command) -> ExecResult
session.exec_budgeted(command, &budget) -> ExecResult
session.checkpoint(label?) / restore(id) / restore_last() -> CheckpointId / restore report
Runnable examples live in the repository:
cargo run --example local
EXECKIT_SSH="user:password@host:22" cargo run --example ssh
Full API docs are on docs.rs/execkit.
Python SDK
execkit-py wraps the same Rust core with a Python API, published to PyPI as
execkit.
pip install execkit
from execkit import Session, Policy
s = Session.local(policy=Policy(allow=[], deny=["rm"]))
r = s.exec("echo hi; echo err 1>&2; cd /tmp")
print(r.stdout, r.exit_code, r.cwd) # "hi" 0 "/tmp"
s.close()
The result object mirrors the Rust ExecResult: split stdout / stderr,
exit_code, duration_ms, cwd, and truncated, already ANSI-stripped and
secret-redacted. State persists across exec calls on the same session.
SSH and Docker sessions work through Session.ssh(...) / Session.docker(...)
with the same options as Transports. Output budgets are
keyword arguments (tail, head, grep, max_chars) on the session
constructors and on exec. Checkpoints are not exposed in the Python SDK yet;
use the Rust library or the MCP server for those.
Wheels ship for Linux and macOS, so no Rust toolchain is needed to install.