AI automation architecture diagram showing orchestrator, browser worker, reply engine, and send gate components

If you manage rental properties on Facebook Marketplace (personal profile) and Zillow, you already know the grind:

20–200 inbound messages per listing
80–90% are repeats: "Is this still available?", "When can I view it?", "Do you take vouchers?", "Pets?"
If you respond slowly, good leads go cold — and Marketplace ranking can suffer

The obvious question is: why not automate it?

Because Marketplace personal inboxes don't provide an official messaging API that traditional automation tools can hook into. You end up in "browser automation land," where most solutions are brittle, crash-prone, and hard to make safe.

This post is the real implementation path we took — including what broke, what worked, and how to build a production-ready setup that:

checks messages hourly
drafts intelligent, guideline-compliant responses
auto-sends with hard safety gates (no wrong-thread replies, no double-sends)
fails safe when Facebook throws friction (checkpoint/2FA)

The Use Case

We wanted an hourly system that:

Opens Facebook Marketplace inbox (personal profile) and Zillow
Detects new / unread inquiries
Reads enough context to understand what the person is asking
Drafts a response using our guidelines (tone, policies, screening questions)
Sends the response automatically
Logs everything and never sends twice

And we wanted it to run without hijacking our daily browser — in other words, automation on a dedicated browser session.

What We Tried First (and Why It Failed)

1) "Just use OpenClaw browser automation"

We started with OpenClaw-style browsing (agent drives the UI). It works sometimes, but it's a classic trap:

crashes and browser disconnects
inconsistent UI element targeting
hard to debug after the fact
no built-in "never double-send" contract

If you're sending messages to real people, reliability isn't optional.

2) "Let the LLM drive the whole browser" (browser-use style)

We also looked at browser-use (Python) and similar "agent pilots the UI" stacks. They're great for demos, and they can be surprisingly capable — but for production messaging, they create unacceptable risk:

an LLM can mis-click the wrong thread
a retry can cause a double-send
the agent can think it sent when it didn't
you can't easily enforce idempotency at the UI level

For production, you want the LLM to do what it's best at (language) and deterministic code to do what it's best at (navigation + verification + sending).

3) Model routing instability (OpenRouter-style)

Routing layers can be fine, but for continuous unattended automation, you must assume intermittent failures (timeouts/502s) and engineer retries + backoff + fail-closed behavior.

The bigger point: the "AI model choice" matters less than whether your system is stateful, idempotent, observable, and restartable.

The Breakthrough: Run a Spike Before Building Anything Big

Before we built a "system," we ran the smallest possible test:

Can an agent reliably attach to a dedicated, logged-in Chrome session and extract thread data repeatedly?

We used Chrome DevTools MCP against a dedicated Chrome profile and ran 10 consecutive extractions.

Result: 10/10 successful runs, 0 failures, 0 checkpoint/2FA screens, stable thread counts and stable extraction.

That single test told us something crucial:

✅ We can build a robust automation layer without depending on the LLM to "figure out the UI."
✅ We can treat the browser as an API (via DevTools / CDP) and build deterministic logic around it.

The Production Architecture (What Actually Works)

Here's the architecture that scales from "spike" to "production":

Hourly Scheduler (cron/systemd timer)
        |
        v
Orchestrator (state machine + lock + DB)
        |
        +--> Browser Worker (deterministic)
        |        |
        |        +--> Dedicated Chrome session (logged-in)
        |
        +--> Reply Engine (LLM for text only)
        |
        +--> Send Gate (verification + idempotency)
        |
        v
DB (checkpoints + send_log + failures + artifacts index)

Key design decisions

Do NOT let the LLM choose what to click. Deterministic code selects threads, extracts messages, and performs "send".
Every send is protected by an idempotency contract. If we crash and restart, we must never send twice.
Every send is verified. We confirm we are in the intended thread and that the inbound message didn't change.
Challenge mode is first-class. If Marketplace shows checkpoint/2FA, we stop sending and alert.
Artifacts on every failure. Screenshot + HTML snapshot + console logs (and trace when using Playwright) so failures are fixable.

Step-by-Step: From Spike to Production

Step 1 — Run a dedicated Chrome profile (stable login + isolation)

macOS example:

mkdir -p "$HOME/.chrome-marketplace-bot"

/Applications/Google Chrome.app/Contents/MacOS/Google Chrome   --remote-debugging-port=9222   --user-data-dir="$HOME/.chrome-marketplace-bot"   --no-first-run --no-default-browser-check   "https://www.facebook.com/"

Log in manually and keep that window running (minimized). Do the same for Zillow in another tab within the same profile, or use a separate profile for isolation.

Why this matters: you're not "logging in with a bot." You're attaching to an already authenticated session.

Step 2 — Use MCP for development, but design for Playwright production

For the spike and early development, MCP is excellent because it's fast to iterate and debug.

Chrome DevTools MCP via npx:

npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222

If you use Claude Code or Codex CLI, add the MCP server once:

# Claude Code
claude mcp add --transport stdio chrome-devtools --   npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222

# Codex CLI
codex mcp add chrome-devtools --   npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222

But: for the long run, build your Browser Worker so you can swap to Playwright (connectOverCDP) when you want tracing, retries, and better long-term automation ergonomics.

Step 3 — Normalize what you extract into stable objects

Your Browser Worker should output structured objects, not "agent thoughts."

Example normalized record:

{
  "platform": "facebook_marketplace",
  "thread_key": "selling:tiffany:some-stable-id",
  "buyer_name": "Tiffany",
  "inbound_msg_key": "msg:1738095123:hash",
  "inbound_text": "Is this still available? When can I see it?",
  "timestamp_utc": "2026-02-24T18:08:00Z",
  "context": [
    {"direction": "in", "text": "..."},
    {"direction": "out", "text": "..."},
    {"direction": "in", "text": "..."}
  ]
}

Why keys matter: you can't do safe automation without stable identifiers.

Step 4 — The Reply Engine: LLM generates text only

Your LLM should return strict JSON:

intent (STILL_AVAILABLE, SHOWING_REQUEST, PRICE, SCREENING, OTHER, SPAM)
confidence (0–1)
reply_text

Draft prompt template:

You are an assistant helping respond to rental inquiries.

Rules:
- Be concise and friendly.
- Do not invent facts (price/address/availability) unless provided.
- Ask at most 2 questions.
- Prefer scheduling + screening basics (move-in date, # occupants, pets).
- No sensitive data requests.

CONTEXT:
Platform: {{platform}}
Buyer name: {{buyer_name}}
Listing context: {{listing_context}}
Last inbound message: {{inbound_text}}
Recent messages:
{{context}}

Return JSON only:
{"intent":"...","confidence":0.0,"reply_text":"..."}

Then run a self-check prompt that returns:

{"approved": true, "reasons": []}

If self-check fails, route to "needs approval" instead of sending.

Step 5 — The Send Gate (this is where production systems win)

Before sending a message, enforce all of these:

Thread verification: prove you're in the same thread you extracted
Inbound verification: last inbound message key is unchanged
Idempotency: (thread_key, inbound_msg_key) not already in send_log
Policy gating: only allowlisted intents with confidence ≥ threshold
Post-send verification: confirm the outgoing message is visible in the thread
Write send log: store outbound_hash + timestamp

Example send policy (start conservative):

allowlist intents: STILL_AVAILABLE, SHOWING_REQUEST, AVAILABILITY
confidence >= 0.80 and self-check approved

This is how you stop the two nightmare failures:

replying in the wrong chat
sending duplicates after retries

Step 6 — Run hourly with locking and artifacts

Hourly does not mean "always-on tab automation." It means:

run a job
do work
exit cleanly

Use either:

cron + a lock (so you never overlap), or
systemd timer + service (recommended)

Artifacts per run:

artifacts/<run_id>/marketplace_extract.json
artifacts/<run_id>/drafts.json
artifacts/<run_id>/send_results.json
on failure: screenshot + html snapshot + console logs

Adding Zillow

For Zillow you have two practical routes:

Preferred: route Zillow leads/messages into a CRM inbox via integration, then automate there
Fallback: automate the Zillow UI using the exact same contract (extract → draft → send gate)

Whichever you choose, keep the same invariants:

stable IDs
idempotency
verification
artifacts
challenge mode handling (different UIs, same idea)

Deployment Choices: Laptop vs VM vs Provider

Laptop (fastest to start)

Works for hourly checks, but you will lose reliability when:

your laptop sleeps
your network changes
Chrome updates/restarts unexpectedly

Small cloud VM (most robust for "set and forget")

A VM runs:

Chrome profile (logged in)
orchestrator (hourly)
worker

Your computer interacts by:

receiving alerts (challenge mode / failures)
or triggering runs via an HTTP endpoint

Managed headless provider (optional)

A provider gives stable remote browsers and scaling. It can help, but it won't remove Marketplace's inherent friction. The core reliability comes from your state machine + send gate.

Risks, Guardrails, and Reality Checks

Marketplace is inherently fragile

You proved 10/10 stable reads. Great. But production will eventually hit:

checkpoint/2FA
UI changes
timeouts
slow loads

Plan for it:

CHALLENGE_MODE: stop sending and alert
keep artifacts
resume after manual resolution

Respect platform policies

Automating messaging may violate platform terms depending on usage. Build:

conservative rate limits
human review for low confidence
clear audit logs

What "Production-Ready" Looks Like

If you implement the above, you get:

reliable hourly checks
safe auto-sends for high-confidence intents
a review queue for everything else
no duplicate sends
no "agent hallucinated a click" problems
fast debugging from artifacts

Appendix: Minimal DB Schema (SQLite)

CREATE TABLE IF NOT EXISTS threads (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  platform TEXT NOT NULL,
  thread_key TEXT NOT NULL,
  last_inbound_key TEXT,
  last_seen_at TEXT,
  status TEXT DEFAULT 'OK',
  UNIQUE(platform, thread_key)
);

CREATE TABLE IF NOT EXISTS messages (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  thread_key TEXT NOT NULL,
  direction TEXT NOT NULL, -- 'in' or 'out'
  msg_key TEXT NOT NULL,
  ts TEXT,
  text TEXT,
  UNIQUE(thread_key, msg_key)
);

CREATE TABLE IF NOT EXISTS send_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  thread_key TEXT NOT NULL,
  inbound_msg_key TEXT NOT NULL,
  outbound_hash TEXT NOT NULL,
  sent_ts TEXT NOT NULL,
  UNIQUE(thread_key, inbound_msg_key)
);

CREATE TABLE IF NOT EXISTS checkpoints (
  platform TEXT PRIMARY KEY,
  cursor_json TEXT,
  updated_at TEXT
);

CREATE TABLE IF NOT EXISTS failures (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  run_id TEXT NOT NULL,
  platform TEXT NOT NULL,
  reason TEXT NOT NULL,
  artifact_path TEXT,
  created_at TEXT NOT NULL
);

If you want this built quickly

The fastest engineering path is:

keep using MCP for extraction while you harden the orchestrator + DB + send gate
add sending behind allowlist + confidence threshold
once stable, migrate Browser Worker to Playwright connectOverCDP for better tracing and maintainability
move to a VM when you want true "set and forget"

That's the difference between a cool demo and a system you can rely on.

Ready to automate your rental inquiries? Book an AI-First Fit Call and we'll help you build a production-ready automation system tailored to your specific property portfolio. Or explore how our AI consulting services can accelerate your transformation.

About the Author

Levi Brackman

Levi Brackman is the founder of Be AI First, helping companies become AI-first in 6 weeks. He builds and deploys agentic AI systems daily and advises leadership teams on AI transformation strategy.

Learn more →

Automating Facebook Marketplace + Zillow Rental Inquiries with AI (Without a Marketplace API)