If you manage rental properties on Facebook Marketplace (personal profile) and Zillow, you already know the grind:
- 20–200 inbound messages per listing
- 80–90% are repeats: "Is this still available?", "When can I view it?", "Do you take vouchers?", "Pets?"
- If you respond slowly, good leads go cold — and Marketplace ranking can suffer
The obvious question is: why not automate it?
Because Marketplace personal inboxes don't provide an official messaging API that traditional automation tools can hook into. You end up in "browser automation land," where most solutions are brittle, crash-prone, and hard to make safe.
This post is the real implementation path we took — including what broke, what worked, and how to build a production-ready setup that:
- checks messages hourly
- drafts intelligent, guideline-compliant responses
- auto-sends with hard safety gates (no wrong-thread replies, no double-sends)
- fails safe when Facebook throws friction (checkpoint/2FA)
The Use Case
We wanted an hourly system that:
- Opens Facebook Marketplace inbox (personal profile) and Zillow
- Detects new / unread inquiries
- Reads enough context to understand what the person is asking
- Drafts a response using our guidelines (tone, policies, screening questions)
- Sends the response automatically
- Logs everything and never sends twice
And we wanted it to run without hijacking our daily browser — in other words, automation on a dedicated browser session.
What We Tried First (and Why It Failed)
1) "Just use OpenClaw browser automation"
We started with OpenClaw-style browsing (agent drives the UI). It works sometimes, but it's a classic trap:
- crashes and browser disconnects
- inconsistent UI element targeting
- hard to debug after the fact
- no built-in "never double-send" contract
If you're sending messages to real people, reliability isn't optional.
2) "Let the LLM drive the whole browser" (browser-use style)
We also looked at browser-use (Python) and similar "agent pilots the UI" stacks. They're great for demos, and they can be surprisingly capable — but for production messaging, they create unacceptable risk:
- an LLM can mis-click the wrong thread
- a retry can cause a double-send
- the agent can think it sent when it didn't
- you can't easily enforce idempotency at the UI level
For production, you want the LLM to do what it's best at (language) and deterministic code to do what it's best at (navigation + verification + sending).
3) Model routing instability (OpenRouter-style)
Routing layers can be fine, but for continuous unattended automation, you must assume intermittent failures (timeouts/502s) and engineer retries + backoff + fail-closed behavior.
The bigger point: the "AI model choice" matters less than whether your system is stateful, idempotent, observable, and restartable.
The Breakthrough: Run a Spike Before Building Anything Big
Before we built a "system," we ran the smallest possible test:
Can an agent reliably attach to a dedicated, logged-in Chrome session and extract thread data repeatedly?
We used Chrome DevTools MCP against a dedicated Chrome profile and ran 10 consecutive extractions.
Result: 10/10 successful runs, 0 failures, 0 checkpoint/2FA screens, stable thread counts and stable extraction.
That single test told us something crucial:
✅ We can build a robust automation layer without depending on the LLM to "figure out the UI."
✅ We can treat the browser as an API (via DevTools / CDP) and build deterministic logic around it.
The Production Architecture (What Actually Works)
Here's the architecture that scales from "spike" to "production":
Hourly Scheduler (cron/systemd timer)
|
v
Orchestrator (state machine + lock + DB)
|
+--> Browser Worker (deterministic)
| |
| +--> Dedicated Chrome session (logged-in)
|
+--> Reply Engine (LLM for text only)
|
+--> Send Gate (verification + idempotency)
|
v
DB (checkpoints + send_log + failures + artifacts index)
Key design decisions
- Do NOT let the LLM choose what to click. Deterministic code selects threads, extracts messages, and performs "send".
- Every send is protected by an idempotency contract. If we crash and restart, we must never send twice.
- Every send is verified. We confirm we are in the intended thread and that the inbound message didn't change.
- Challenge mode is first-class. If Marketplace shows checkpoint/2FA, we stop sending and alert.
- Artifacts on every failure. Screenshot + HTML snapshot + console logs (and trace when using Playwright) so failures are fixable.
Step-by-Step: From Spike to Production
Step 1 — Run a dedicated Chrome profile (stable login + isolation)
macOS example:
mkdir -p "$HOME/.chrome-marketplace-bot"
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-port=9222 --user-data-dir="$HOME/.chrome-marketplace-bot" --no-first-run --no-default-browser-check "https://www.facebook.com/"
Log in manually and keep that window running (minimized). Do the same for Zillow in another tab within the same profile, or use a separate profile for isolation.
Why this matters: you're not "logging in with a bot." You're attaching to an already authenticated session.
Step 2 — Use MCP for development, but design for Playwright production
For the spike and early development, MCP is excellent because it's fast to iterate and debug.
Chrome DevTools MCP via npx:
npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222
If you use Claude Code or Codex CLI, add the MCP server once:
# Claude Code
claude mcp add --transport stdio chrome-devtools -- npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222
# Codex CLI
codex mcp add chrome-devtools -- npx -y chrome-devtools-mcp@latest --browserUrl=http://127.0.0.1:9222
But: for the long run, build your Browser Worker so you can swap to Playwright (connectOverCDP) when you want tracing, retries, and better long-term automation ergonomics.
Step 3 — Normalize what you extract into stable objects
Your Browser Worker should output structured objects, not "agent thoughts."
Example normalized record:
{
"platform": "facebook_marketplace",
"thread_key": "selling:tiffany:some-stable-id",
"buyer_name": "Tiffany",
"inbound_msg_key": "msg:1738095123:hash",
"inbound_text": "Is this still available? When can I see it?",
"timestamp_utc": "2026-02-24T18:08:00Z",
"context": [
{"direction": "in", "text": "..."},
{"direction": "out", "text": "..."},
{"direction": "in", "text": "..."}
]
}
Why keys matter: you can't do safe automation without stable identifiers.
Step 4 — The Reply Engine: LLM generates text only
Your LLM should return strict JSON:
intent(STILL_AVAILABLE, SHOWING_REQUEST, PRICE, SCREENING, OTHER, SPAM)confidence(0–1)reply_text
Draft prompt template:
You are an assistant helping respond to rental inquiries.
Rules:
- Be concise and friendly.
- Do not invent facts (price/address/availability) unless provided.
- Ask at most 2 questions.
- Prefer scheduling + screening basics (move-in date, # occupants, pets).
- No sensitive data requests.
CONTEXT:
Platform: {{platform}}
Buyer name: {{buyer_name}}
Listing context: {{listing_context}}
Last inbound message: {{inbound_text}}
Recent messages:
{{context}}
Return JSON only:
{"intent":"...","confidence":0.0,"reply_text":"..."}
Then run a self-check prompt that returns:
{"approved": true, "reasons": []}
If self-check fails, route to "needs approval" instead of sending.
Step 5 — The Send Gate (this is where production systems win)
Before sending a message, enforce all of these:
- Thread verification: prove you're in the same thread you extracted
- Inbound verification: last inbound message key is unchanged
- Idempotency:
(thread_key, inbound_msg_key)not already insend_log - Policy gating: only allowlisted intents with confidence ≥ threshold
- Post-send verification: confirm the outgoing message is visible in the thread
- Write send log: store
outbound_hash+ timestamp
Example send policy (start conservative):
- allowlist intents:
STILL_AVAILABLE,SHOWING_REQUEST,AVAILABILITY confidence >= 0.80and self-check approved
This is how you stop the two nightmare failures:
- replying in the wrong chat
- sending duplicates after retries
Step 6 — Run hourly with locking and artifacts
Hourly does not mean "always-on tab automation." It means:
- run a job
- do work
- exit cleanly
Use either:
- cron + a lock (so you never overlap), or
- systemd timer + service (recommended)
Artifacts per run:
artifacts/<run_id>/marketplace_extract.jsonartifacts/<run_id>/drafts.jsonartifacts/<run_id>/send_results.json- on failure: screenshot + html snapshot + console logs
Adding Zillow
For Zillow you have two practical routes:
- Preferred: route Zillow leads/messages into a CRM inbox via integration, then automate there
- Fallback: automate the Zillow UI using the exact same contract (extract → draft → send gate)
Whichever you choose, keep the same invariants:
- stable IDs
- idempotency
- verification
- artifacts
- challenge mode handling (different UIs, same idea)
Deployment Choices: Laptop vs VM vs Provider
Laptop (fastest to start)
Works for hourly checks, but you will lose reliability when:
- your laptop sleeps
- your network changes
- Chrome updates/restarts unexpectedly
Small cloud VM (most robust for "set and forget")
A VM runs:
- Chrome profile (logged in)
- orchestrator (hourly)
- worker
Your computer interacts by:
- receiving alerts (challenge mode / failures)
- or triggering runs via an HTTP endpoint
Managed headless provider (optional)
A provider gives stable remote browsers and scaling. It can help, but it won't remove Marketplace's inherent friction. The core reliability comes from your state machine + send gate.
Risks, Guardrails, and Reality Checks
Marketplace is inherently fragile
You proved 10/10 stable reads. Great. But production will eventually hit:
- checkpoint/2FA
- UI changes
- timeouts
- slow loads
Plan for it:
- CHALLENGE_MODE: stop sending and alert
- keep artifacts
- resume after manual resolution
Respect platform policies
Automating messaging may violate platform terms depending on usage. Build:
- conservative rate limits
- human review for low confidence
- clear audit logs
What "Production-Ready" Looks Like
If you implement the above, you get:
- reliable hourly checks
- safe auto-sends for high-confidence intents
- a review queue for everything else
- no duplicate sends
- no "agent hallucinated a click" problems
- fast debugging from artifacts
Appendix: Minimal DB Schema (SQLite)
CREATE TABLE IF NOT EXISTS threads (
id INTEGER PRIMARY KEY AUTOINCREMENT,
platform TEXT NOT NULL,
thread_key TEXT NOT NULL,
last_inbound_key TEXT,
last_seen_at TEXT,
status TEXT DEFAULT 'OK',
UNIQUE(platform, thread_key)
);
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
thread_key TEXT NOT NULL,
direction TEXT NOT NULL, -- 'in' or 'out'
msg_key TEXT NOT NULL,
ts TEXT,
text TEXT,
UNIQUE(thread_key, msg_key)
);
CREATE TABLE IF NOT EXISTS send_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
thread_key TEXT NOT NULL,
inbound_msg_key TEXT NOT NULL,
outbound_hash TEXT NOT NULL,
sent_ts TEXT NOT NULL,
UNIQUE(thread_key, inbound_msg_key)
);
CREATE TABLE IF NOT EXISTS checkpoints (
platform TEXT PRIMARY KEY,
cursor_json TEXT,
updated_at TEXT
);
CREATE TABLE IF NOT EXISTS failures (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL,
platform TEXT NOT NULL,
reason TEXT NOT NULL,
artifact_path TEXT,
created_at TEXT NOT NULL
);
If you want this built quickly
The fastest engineering path is:
- keep using MCP for extraction while you harden the orchestrator + DB + send gate
- add sending behind allowlist + confidence threshold
- once stable, migrate Browser Worker to Playwright
connectOverCDPfor better tracing and maintainability - move to a VM when you want true "set and forget"
That's the difference between a cool demo and a system you can rely on.
Ready to automate your rental inquiries? Book an AI-First Fit Call and we'll help you build a production-ready automation system tailored to your specific property portfolio. Or explore how our AI consulting services can accelerate your transformation.
