Tech

ZoClone rebuilds Zo Computer’s agent, skills, memory, scheduling, and automation in 775 Python lines

AI-Generated Summary

1 sources

20 hours ago

1 views

ZoClone rebuilds Zo Computer’s agent, skills, memory, scheduling, and automation in 775 Python lines

Key Points

ZoClone is a personal Python rebuild that aims to replicate Zo Computer’s subsystem “shape” using a single orchestrator class and ~775 lines of code.
Persistent storage in ZoClone uses four SQLite tables for conversations, messages, a key-value memory store, and file contents/metadata.
Skills are auto-discovered from a local directory by reading SKILL.md frontmatter and dynamically importing scripts/<name>.py with a run/execute handler.
Parallel agent execution is implemented by sending concurrent HTTP requests to Zo’s /zo/ask endpoint and aggregating results with asyncio.gather.
The compute pool and scheduler are implemented with lightweight in-process mechanisms: a priority FIFO assignment under a threading.Lock and an rrule-based polling loop over due tasks in SQLite.

Two posts describe ZoClone, a personal Python rewrite intended to recreate the “shape” of Zo Computer without using Docker or a web framework. The author builds a small codebase (~775 lines across about ten modules) centered on a single orchestrator class that owns a SQLite database connection, a ThreadPoolExecutor, and references to subsystems for memory, automation, hosting, and compute. Persistent state uses four SQLite tables (conversations, messages, memory key-value store, and files). Skills are discovered by scanning a local folder: each skill has a SKILL.md file with YAML-like frontmatter (name, description, triggers) and a corresponding scripts/<name>.py module loaded via importlib. The agent manager triggers parallel agent runs by calling Zo’s /zo/ask HTTP API and uses asyncio.gather to run multiple requests concurrently. Memory recall is implemented as a TF-IDF-style similarity search (with the interface designed to be swapped later for real embeddings). Scheduling uses a simple rrule-based cadence and a background thread that finds due tasks in SQLite and advances next_run. A compute pool uses in-process job and node dictionaries with a single lock and priority FIFO assignment. The author also lists deliberate omissions and trade-offs such as lack of sandboxing, limited auth, no streaming, and no semantic vector search.

How Outlets Covered This Story

DEV

Dev.to

I rebuilt Zo Computer from scratch in 775 lines of Python — here's what stuck and what snapped

Zo Computer gives you an AI agent, a skills registry, a compute pool, browser automation, file hosting, scheduled automations, and persistent memory — all on a personal server. I wanted to understand every seam, so I rebuilt the whole thing in vanilla Python 3 with no web framework and no Docker. The result is ZoClone: 10 modules, 775 lines, 4 SQLite tables, one ThreadPoolExecutor. This is what the architecture actually looks like when you strip out the platform. The whole orchestrator in one class The main module is ZoClone.__init__ — and that's the entire dependency graph. Each subsystem is an attribute: class ZoClone: def __init__(self): self.db = init_db() self.executor = ThreadPoolExecutor(max_workers=10) self.ai_client = None self.pool = pool # ComputePool singleton self.hosting = hosting # HostingService singleton self.memory = memory # SQLite-backed memory self.scheduler = scheduler # cron-like automations No DI container, no event bus, no message queue. Every tool is a method on the same object. If you're coming from a microservice background, this is going to look like a 2014 Django app — and that's the point. When you can fit the whole mental model on one screen, you stop second-guessing where a bug lives. The SQLite schema is the truth Four tables. No ORM. No migrations. The schema is in a single executescript block: CREATE TABLE IF NOT EXISTS conversations(id TEXT PRIMARY KEY, title TEXT, updated_at INTEGER); CREATE TABLE IF NOT EXISTS messages(id TEXT PRIMARY KEY, conv_id TEXT, role TEXT, content TEXT, tools TEXT, created_at INTEGER); CREATE TABLE IF NOT EXISTS memory(id TEXT PRIMARY KEY, key TEXT UNIQUE, value TEXT, updated_at INTEGER); CREATE TABLE IF NOT EXISTS files(id TEXT PRIMARY KEY, path TEXT UNIQUE, content TEXT, encoding TEXT, updated_at INTEGER); IDs are SHA-256 hashes of (timestamp, content) truncated to 24 chars. The tools column on messages is a freeform JSON blob. The memory table is a key-value store with UNIQUE on key, which forces last-write-wins semantics. When your entire data model is four tables, schema design becomes a five-minute conversation instead of a five-day one. The skills system is just frontmatter + importlib Skills in Zo are a folder with a SKILL.md (frontmatter) and a scripts/<name>.py (handler). I auto-discover them at import time: def load_skill(name: str, path: Path) -> Skill: md_content = path.read_text() # parse YAML-ish frontmatter between --- markers frontmatter = {} if md_content.startswith("---"): end = md_content.find("---", 3) for line in md_content[3:end].strip().split("\n"): if ":" in line: k, v = line.split(":", 1) frontmatter[k.strip()] = v.strip() py_file = path.parent / "scripts" / f"{name}.py" spec = importlib.util.spec_from_file_location(name, py_file) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) handler = getattr(module, "run", getattr(module, "execute", None)) return Skill(name=name, description=..., triggers=..., handler=handler) No registry service, no API call to discover skills. The filesystem is the registry. Drop a folder, restart, it's loaded. The triggers field in frontmatter is just a comma-separated string — the LLM gets all skill descriptions in its system prompt and decides which one to call. There's no embedding-based retrieval because, at 30 skills, exact-match triggers work fine. Compute pool: priority queue with a single lock The peer-to-peer compute mesh in ZoClone is a dict of jobs, a dict of nodes, and one threading.Lock: def assign_job(self, node_id: str) -> Optional[Dict]: with self.lock: pending = [j for j in self.jobs.values() if j["status"] == "pending"] if not pending: return None pending.sort(key=lambda x: -x["priority"]) job = pending[0] job["status"] = "assigned" job["assigned_node"] = node_id return job That's it. The hub polls, picks the highest-priority pending job, marks the node busy, returns the work. No Redis Streams, no RabbitMQ, no Kafka. The trade-off is obvious: this is a single-process orchestrator, not a horizontally-scalable scheduler. But for a 50-node grid running nightly ML batch jobs, you don't need Kafka. You need a lock and a sort. GPU tier multipliers, regional pricing, and reputation decay are all JSON columns in the nodes dict. When you need to add a new pricing rule, you change one line of assign_job. Compare that to a Kubernetes operator with custom resource definitions, admission webhooks, and reconciler loops. The agent manager is just async gather Zo has a /zo/ask API that spawns child agent invocations. The clone just calls it: async def spawn(self, agent_id: str, prompt: str): async with aiohttp.ClientSession() as session: async with session.post( "https://api.zo.computer/zo/ask", headers={"authorization": self.api_token, "content-type": "application/json"}, json={"input": prompt, "model_name": self.model} ) as resp: return {"agent_id": agent_id, "output": (await resp.json()).get("output", "")} async def spawn_all(self, agents: list): return await asyncio.gather(*[self.spawn(a["id"], a["prompt"]) for a in agents]) Five agent invocations in parallel is asyncio.gather. No Celery, no RQ, no Dask. The model_name is hardcoded — there's exactly one LLM driver, and it's whatever Zo gives you. If you want a different model, change one string. The honest list of things that broke No sandboxing. run_command is subprocess.run(cmd, shell=True). The agent can rm -rf ~ and it will. Production Zo wraps this in gVisor; I don't. No embedding search. Memory recall is a LIKE '%query%' scan. Fine at 1k rows, embarrassing at 100k. No streaming. Every chat() call is blocking. You see the full response or nothing. No auth. set_key() writes API keys to a flat JSON file in ~/.zoclone/. Multi-user means multi-disaster. No tests. The whole codebase is a personal learning exercise. There is one if __name__ == "__main__" block that prints the pool status. What I'd change if I were building a real product Wrap run_command in a gVisor container, or at minimum a chroot + seccomp. Swap the memory table for SQLite-vec0 and do real semantic recall. Replace the lock-and-dict compute pool with a proper work queue (BullMQ, or just Redis streams). Add an Authorization header check on every API endpoint. Even internal services. Add a single integration test that runs a real agent loop end-to-end. The real lesson wasn't "look how short the code is" — it was "look how much of the platform is just a thin layer over a database, a thread pool, and a few HTTP calls." The parts that are genuinely hard (the LLM orchestration loop, the skill discovery) are maybe 100 lines. The rest is plumbing, and most of the plumbing doesn't need to exist. Repo: github.com/AmSach/ZoClone License: MIT Stack: Python 3.10+, SQLite, requests, aiohttp, no web framework If you've built a personal-AI clone of your own, drop the repo link in the comments. I want to see how other people split the agent loop from the storage layer. Python #AI #OpenSource #Architecture #BuildInPublic #SQLite

4 hours ago

DEV

Dev.to

I rebuilt Zo Computer's seven subsystems in 800 lines of Python — here's the architecture, the tradeoffs, and what I cut

I rebuilt Zo Computer's seven subsystems in 800 lines of Python — here's the architecture, the tradeoffs, and what I cut I've been using Zo Computer as my primary AI workspace for a few months. The piece I kept coming back to wasn't the model — it was the substrate: the agent manager that spawns parallel sessions, the skills registry that auto-loads SKILL.md files, the memory engine that compresses old context, the rrule-based scheduler, the compute pool that turns idle machines into workers, the BYOK client that swaps between Groq/OpenAI/Anthropic, and the headless browser that actually clicks things. So I asked the obvious question: how much of that is concept and how much is platform glue? Could a single Python package on a laptop give a developer 80% of the same shape? ZoClone is my answer. Seven files in src/, ~800 lines of dependency-light Python, and every subsystem above is wired up. No daemon, no Docker, no Postgres — just ~/.zoclone/*.db and a ThreadPoolExecutor. Here's the architecture, what I learned about which parts are easy to clone and which ones are doing real work, and the shortcuts I had to take to fit the whole thing in a single repo. The seven files ZoClone/ ├── src/ │ ├── zo.py # top-level orchestrator + ask() loop │ ├── agent_manager.py # parallel async agents via Zo /zo/ask │ ├── skills.py # SKILL.md auto-loader + handler dispatch │ ├── memory.py # TF-IDF fallback embeddings + context recall │ ├── automation.py # rrule scheduler with minute/hour/day cadences │ ├── compute_pool.py # node registry + priority FIFO dispatch │ ├── browser.py # Playwright headless + navigate/screenshot/eval │ ├── byok.py # key vault for Groq/OpenAI/Anthropic/Ollama │ ├── zo_client.py # OpenAI-compatible chat() abstraction │ └── services.py # process supervisor (start/stop/logs) Total LoC: 775. No __init__.py magic, no metaclass tricks, no plugin discovery beyond a directory scan. The constraint forced every interface to be a plain function or a class with three methods. The orchestrator: zo.py Everything threads through a single ZoClone class that owns the DB connection, a thread pool, and a AIClient that's lazily constructed on first call to ask(). class ZoClone: def __init__(self): self.db = init_db() self.executor = ThreadPoolExecutor(max_workers=10) self.ai_client = None self.pool = pool # module-level singleton self.hosting = hosting # module-level singleton self.memory = memory self.scheduler = scheduler def ask(self, conv_id: str, message: str, provider: str = "groq", model: str = "", tools: list[dict] = None) -> dict: if not self.ai_client: key = get_key(provider) m = model or PROVIDERS[provider]["models"][0] self.ai_client = AIClient(provider, m, key) messages = self.memory.get_context(conv_id) messages.append({"role": "user", "content": message}) system = f"You are Sentience, an advanced AI running locally. Workspace: {os.getcwd()}." resp = self.ai_client.chat( [{"role": "system", "content": system}] + messages[-20:], tools or [], ) # ... persist + return The trick is AIClient — it's the only piece that has to be OpenAI-compatible, because every modern provider (Groq, Together, OpenRouter, Ollama, LM Studio) has converged on the chat completions schema. Anthropic needed a tiny shim, but Groq works out of the box. The skills system: auto-loading SKILL.md This is the part I'm proudest of. The directory scan is six lines: def load_all_skills(): global SKILLS SKILLS = {} if not SKILL_DIR.exists(): return for item in SKILL_DIR.iterdir(): if item.is_dir() and (item / "SKILL.md").exists(): skill = load_skill(item.name, item / "SKILL.md") if skill: SKILLS[skill.name] = skill The interesting bit is the SKILL.md parser. It accepts the same frontmatter shape as the Agent Skills spec — name, description, triggers (comma-separated) — and looks for scripts/<name>.py to find a run() or execute() callable. That's the entire plugin API. There's no registration, no decorator, no manifest; drop a folder in skills/ and the next import picks it up. The price: there's no versioning, no dependency declaration, no per-skill sandbox. If you want a skill to be hermetic, you have to do that yourself. For a single-user laptop, that's fine. For a multi-tenant platform, it's not. The agent manager: parallel aiohttp over /zo/ask I cheated here, and I'm fine with it. The original "spawn a parallel agent" primitive is itself a remote call to a model, and Zo's /zo/ask endpoint is open to anyone with a token. So: async def spawn(self, agent_id: str, prompt: str, callback=None): async with aiohttp.ClientSession() as session: async with session.post( "https://api.zo.computer/zo/ask", headers={"authorization": self.api_token, "content-type": "application/json"}, json={"input": prompt, "model_name": "vercel:minimax/minimax-m2.7"}, ) as resp: return {"agent_id": agent_id, "output": (await resp.json())["output"]} async def spawn_all(self, agents: list) -> list: return await asyncio.gather(*[self.spawn(a["id"], a["prompt"]) for a in agents]) spawn_all fires N concurrent requests, asyncio.gather waits for the slowest, and you get a list of outputs back. A ThreadPoolExecutor(max_workers=10) is the sync equivalent for callers that don't want to be async. In practice the bottleneck is the model, not the network — 10 parallel calls saturate the rate limiter long before they saturate asyncio. The memory engine: TF-IDF as a placeholder I'll be honest: this is the weakest subsystem. embed_tfidf hashes tokens into a 512-dim vector, cosine does the math, and recall() returns the top-k nodes whose embedding has the highest similarity. It works for short prompts and small corpora, but it is not semantic — database and sql don't cluster the way they would with a real embedding model. The reason I shipped it anyway: a real embedding model (sentence-transformers, or a remote call) is one swap away, and the interface — memorize(content, meta) -> nid, recall(query, top_k) -> [{id, content, meta}] — doesn't change. When I get around to plugging in nomic-embed-text via Ollama, nothing in zo.py needs to move. The trick was defining the right shape first and being honest about which fields the placeholder is faking. The scheduler: rrule in 30 lines The rrule spec is a 50-page document. I needed three frequencies and a count. So: def parse_rrule(rrule: str) -> dict: result = {"interval": 86400, "count": 0} # default daily if "FREQ=DAILY" in rrule: result["interval"] = 86400 elif "FREQ=HOURLY" in rrule: result["interval"] = 3600 elif "FREQ=MINUTELY" in rrule: result["interval"] = 60 if "COUNT=" in rrule: m = re.search(r"COUNT=(\d+)", rrule) if m: result["count"] = int(m.group(1)) return result A daemon thread wakes once a minute, asks SQLite for WHERE enabled=1 AND next_run <= now, fires each one's handler, and bumps next_run by the interval. That's the entire automation system. It's missing timezones, exceptions, and DST handling, but for "run this every hour" it is correct and reliable. The compute pool: priority FIFO over a Python dict ComputePool keeps self.jobs and self.nodes as in-memory dicts protected by a threading.Lock. Heartbeats update last_heartbeat; dispatch sorts pending jobs by -priority and assigns the top one to the next polling node. No leader election, no Raft, no gossip protocol. def assign_job(self, node_id: str) -> dict | None: with self.lock: pending = [j for j in self.jobs.values() if j["status"] == "pending"] if not pending: return None pending.sort(key=lambda x: -x["priority"]) job = pending[0] job["status"] = "assigned" job["assigned_node"] = node_id if node_id in self.nodes: self.nodes[node_id]["status"] = "busy" return job This is a real footgun: in-process state means a process restart loses every pending job. For a real grid you'd want this in Postgres with row-level locks. But for "let me run a job on my second laptop", pip install is the whole onboarding. What I cut, and why Three things are not in the package and probably never will be: The hosted UI — the chat sidebar, the file tree, the agent picker. ZoClone is a library, not an app. Import zo and call zo.ask(...) from a Flask route, a Tk window, a Discord bot, a cron job. Multi-tenant auth — there's exactly one user. whoami() returns the local username. If you want a team plan, fork the repo. A real vector store — TF-IDF is a placeholder. The next iteration swaps it for Ollama's nomic-embed-text (private, free, runs on the same box) and the interface stays the same. Try it git clone https://github.com/AmSach/ZoClone cd ZoClone && pip install aiohttp playwright python -m playwright install chromium python -c "from src.zo import zo; print(zo.ask('test-conv', 'hi'))" If you want a skill added, drop a folder in skills/ with a SKILL.md + scripts/foo.py and open a PR. I merge in 24 hours. If you find a real bug in one of the seven subsystems, open an issue with a minimal repro — there are only 775 lines to search. Seven files, one Python process, no cloud dependency. The shape matters more than the scale. Python #AI #OpenSource #BuildInPublic #PySide6 #LocalFirst

1 day ago

21-year-old dies after bungee/bridge swing in Brazil as safety rope allegedly not attached

A 21-year-old woman dies after a bungee jump or similar bridge swing activity in Brazil, according to multiple reports....

6 sources 5 hours ago

Tech

Trump administration renews dispute with Anthropic after action affecting latest AI models

The Trump administration’s dispute with AI company Anthropic intensifies after the US government moves to limit access t...

4 sources 4 hours ago

Tech

Multiple outlets publish the latest updates on an unspecified ongoing story

The provided inputs do not include any actual article text, details, or reported facts—only generic placeholders reading...

6 sources 1 month ago