<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ross Rader</title>
    <link>https://rossrader.ca</link>
    <description>Writing on AI, Internet Services, and Customer Experience Design</description>
    <language>en-us</language>
    <atom:link href="https://rossrader.ca/feed.xml" rel="self" type="application/rss+xml" />

    <item>
      <title>Exploring Semantic MCP</title>
      <link>https://rossrader.ca/posts/semantic-mcp.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/semantic-mcp.html</guid>
      <pubDate>Wed, 04 Mar 2026 05:00:00 GMT</pubDate>
      <description>I built an MCP server with zero API keys, zero databases, and zero state. A static file and five functions. It works better than it should.</description>
      <content:encoded><![CDATA[<h1>MCP as a semantic layer</h1>
<p>Most MCP servers connect AI to things it can&#39;t reach on its own. Send a Slack message. Create a GitHub issue. Query a database. The tools add capability.</p>
<p>There&#39;s a second kind of MCP server that nobody&#39;s really building yet. One where the AI <em>could</em> do the work itself, but the tools make it reason correctly instead 
of probably.</p>
<h2>Capability vs. meaning</h2>
<p>You could hand an AI a JSON file full of reference data and say &quot;figure it out.&quot; It will parse the schema, invent heuristics, cross-reference fields, and produce a 
plausible answer. It might even be right. But it&#39;s doing that from scratch every time, with no guarantee it interprets the data the way a domain expert would.</p>
<p>Wrap that same data in a thin set of tools, and something changes. The tools don&#39;t add new information. They add judgment. Priority order for evaluating conflicts. 
Ranking logic built on real-world experience. Labels that distinguish &quot;technically claimed but safe to use&quot; from &quot;do not touch.&quot;</p>
<p>The AI goes from reasoning probabilistically to reasoning reliably. Not because it gained a new capability, but because the tools reduced its cognitive load the 
same way a good API reduces a developer&#39;s.</p>
<h2>The experiment</h2>
<p>I built <a href="https://keygap.app">KeyGap</a> to test this. It&#39;s a keyboard shortcut availability tool: pick your platform, pick your apps, see which hotkey combos are still 
free. There&#39;s an interactive web UI and an MCP endpoint, both served from the same Cloudflare Worker, both reading from the same shortcut database.</p>
<p>I should be clear about what KeyGap is not. It&#39;s not a comprehensive shortcut reference. <a href="https://keycheck.dev">keycheck.dev</a> already does that far better, covering 
1,400+ shortcuts across 100+ apps with community contributions. KeyGap covers 12 apps. It&#39;s a toy by comparison.</p>
<p>The point was never &quot;can I build a better shortcut tool.&quot; The point was: what&#39;s the narrowest expression of an MCP server that creates semantic value? How little 
data and how few tools can you get away with and still produce something that makes an AI more reliable than its training data alone?</p>
<p>The answer: a single TypeScript file of curated data, three query functions, and good tool descriptions. One Cloudflare Worker, zero databases. The HTML is a big 
self-contained file. The data module is a big static object. Neither is elegant. But the constraint was deliberate. I wanted to find the floor, not the ceiling.</p>
<h2>What the tools add</h2>
<p>The shortcut data in KeyGap is roughly 300 lines of TypeScript. An AI could read the raw JSON and cross-reference it. But the tools encode things the data alone 
doesn&#39;t express.</p>
<p><code>suggest_hotkeys</code> doesn&#39;t just filter available keys. It ranks by ergonomic comfort, because a human figured out that Cmd+Ctrl is a natural 
left-thumb-plus-left-pinky reach while Cmd+Ctrl+Shift requires three fingers and falls apart at speed. That judgment isn&#39;t in the data. It&#39;s in the function.</p>
<p><code>check_key</code> doesn&#39;t just look up a value. It cascades through OS-level claims, then app claims, then risky-but-probably-fine status, in that priority order. The 
ordering reflects how shortcut conflicts actually work. Give an AI the raw data and it might check app claims first, or treat &quot;risky&quot; the same as &quot;taken.&quot;</p>
<p>The tools are opinions encoded as functions. The data is facts. The tools are facts plus judgment.</p>
<h2>What a real version looks like</h2>
<p>KeyGap is the narrow version: hand-curated data, small scope, single file. A broader take on the same idea would pull from something like keycheck&#39;s 
community-maintained database, covering hundreds of apps instead of twelve. The tools would stay thin, but the ranking logic could get smarter: weighting by app 
install base, factoring in user-reported conflicts, scoring ergonomic comfort based on actual hand geometry research rather than my rough notes.</p>
<p>The web UI and the MCP tools would diverge further. The UI could become a full shortcut explorer with search, filtering, and contribution flows. The MCP tools would 
stay focused on the three questions an AI actually needs to answer: what&#39;s free, is this specific combo taken, and what do you recommend.</p>
<p>The architecture stays the same. Static data, thin query layer, two interfaces. The investment shifts from code to curation.</p>
<h2>Other approaches</h2>
<p>Static data plus tools is the simplest version of a knowledge server. It&#39;s not the only one.</p>
<p><strong>Retrieval-backed.</strong> The tools query a vector store or search index instead of holding everything in memory. You&#39;d need this for something like npm&#39;s full 
dependency graph or the CVE database. More infrastructure, but it scales.</p>
<p><strong>Computed.</strong> Some domains need to calculate answers, not look them up. A color contrast checker doesn&#39;t store every possible color pair. It computes WCAG 
compliance on the fly. The judgment is in the algorithm, not the data.</p>
<p><strong>Hybrid.</strong> Static reference data combined with live lookups. A browser compatibility server might embed curated &quot;safe to use&quot; thresholds but fetch current support 
percentages from caniuse at query time. Static layer for opinions, live layer for facts.</p>
<p>If I were building another one of these, I&#39;d probably start with hybrid. The static data keeps the tools opinionated, and the live lookups keep them current. But 
the pure static version is where I&#39;d tell anyone to start, because it proves the concept with zero infrastructure risk.</p>
<h2>The pattern</h2>
<p>Three things make this work:</p>
<p><strong>Curated data, not just structured data.</strong> A CSV is structured. A curated dataset includes editorial judgment: risk assessments, quality ratings, &quot;this is the one 
you actually want&quot; designations. The curation is the value. The tools are just delivery.</p>
<p><strong>Domain logic in tools, not left to the AI.</strong> Anywhere you have reasoning that an AI would otherwise hallucinate, put it in a tool. The tool doesn&#39;t prevent the AI 
from thinking. It prevents it from guessing where guessing is unnecessary.</p>
<p><strong>Two interfaces from one source of truth.</strong> Humans get a visual interface optimized for scanning and pattern recognition. Machines get MCP tools optimized for 
structured queries and parameter validation. Neither wraps the other. They&#39;re parallel projections of the same data.</p>
<h2>Where it breaks down</h2>
<p>This doesn&#39;t work when data changes faster than you can redeploy, when the dataset outgrows memory, or when domain experts genuinely disagree on the right answer 
(encoding one opinion in a tool is limiting).</p>
<p>But for small, curated, slowly-changing reference data where cross-referencing is tedious and judgment matters? A static file and five functions might be all you 
need.</p>
<p>The current MCP ecosystem is almost entirely integration servers. I think there&#39;s a lot of room for knowledge servers. Not <em>&quot;connect me to a thing&quot;</em> but <em>&quot;help me 
reason correctly about a thing.&quot;</em> The bar is lower than people think. No API keys, no database, no state. Just curated data, a few functions, and well-written tool 
descriptions.</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>mcp</category>
      <category>ai</category>
      <category>ideas</category>
    </item>
    <item>
      <title>Agile was built for a slower machine</title>
      <link>https://rossrader.ca/posts/agile-slower-machine.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/agile-slower-machine.html</guid>
      <pubDate>Sun, 01 Mar 2026 05:00:00 GMT</pubDate>
      <description>AI tooling is making software teams faster than ever - but the processes they&apos;re running were designed for when speed was the problem. What happens when building is cheap and the bottleneck moves somewhere else?</description>
      <content:encoded><![CDATA[<h1>Agile was built for a slower machine</h1>
<p>I&#39;ve seen a growing conversation on LinkedIn and Reddit about what AI tooling is doing to 
software teams - Cursor, Claude, Copilot, the whole wave. Velocity is up, backlogs are 
shrinking - by every measure teams track, things look great. But what&#39;s catching my attention 
is the feeling that teams are losing context - code review burdens have exploded as weeks are 
compressed into days and features are shipping, but its becoming harder to tell which ones 
moved the needle. Teams are building faster than ever but its not claer if they&#39;re learning 
faster too.</p>
<p>And it got me thinking: sprints, backlogs, velocity, story points - all of it was designed to 
manage build capacity. To prioritize carefully, because building was expensive and slow. That 
was the bottleneck, and agile was a really good answer to it. But what if that&#39;s not the 
bottleneck anymore? For a growing chunk of the work, AI has made building fast and cheap. But 
the process hasn&#39;t changed. Teams are using AI to sprint faster through a system designed for 
when sprinting was hard.</p>
<p>If building isn&#39;t the constraint anymore, what is?</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>agile</category>
      <category>vibes</category>
      <category>product</category>
    </item>
    <item>
      <title>Triggering Claude Chrome from the Command Line</title>
      <link>https://rossrader.ca/posts/cc-cc-cli.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/cc-cc-cli.html</guid>
      <pubDate>Sat, 10 Jan 2026 05:00:00 GMT</pubDate>
      <description>PSA for anyone dealing with SaaS tools that have good reporting UIs but garbage APIs: Claude Chrome + Claude Code might be your workaround.</description>
      <content:encoded><![CDATA[<p>I work with a bunch of SaaS applications that have decent reporting and export capabilities but generally
poor or unavailable reporting APIs, so getting that data into an automated pipeline means either building custom integrations or giving into a manual process and doing a lot of clicking.</p>
<p>But what if I could just tell an AI to go click the buttons for me?</p>
<p>I spent some time this morning trying to figure that out. The answer is yes, with some caveats worth understanding before you try it yourself.</p>
<h2>The Experiment</h2>
<p>I had been itching to try out some new Claude features.  Anthropic&#39;s <a href="https://chromewebstore.google.com/detail/claude/fcoeoabgfenejglbffodgkkbkcdhcgfn">Claude for Chrome extension</a> launched recently as an add-on browser agent - it can navigate pages, fill forms, and interact with web applications on your behalf. I was curious whether I could trigger it from outside the browser, specifically from a script or Claude Code instance running from the command line.</p>
<p>If it works, it means I can build mini-data pipelines that pull from multiple reporting systems and sources without building custom API integrations for each source. For example, think of grabbing the most recent forecast from a Google Drive Folder, the most recent actuals from Looker and additional supplemental details from various ad hoc sales, marketing and customer service reports. The idea is to let the Chrome agent handles the browser automation and let my reporting scripts handle everything else.</p>
<p>I started with a simple test: get Claude Chrome to set up and download a dataset from Looker. Navigate to a saved query, configure the export options, download the CSV. Three steps that take me thirty seconds manually but don&#39;t have an easily accessible clean API equivalent.</p>
<h2>What I Learned</h2>
<p>The <code>--chrome</code> flag in Claude Code lets you run browser automation non-interactively:</p>
<pre><code class="language-bash">claude --chrome -p &quot;your prompt here&quot;
</code></pre>
<p><strong>Authentication is inherited.</strong> Claude uses your existing browser session. If you&#39;re logged into Looker in Chrome, the agent can access it. This is what makes the whole approach practical for internal tools.</p>
<p><strong>Direct URLs beat UI navigation.</strong> Every menu click is another opportunity for something to go wrong. If your tool gives you a URL with query parameters (like Looker&#39;s <code>?qid=...</code>), use that instead of instructing the agent to click through the interface. Fewer steps, fewer failures, faster results.</p>
<p><strong>Claude Chrome hesitates at downloads.</strong> My initial test worked right up until the final step - Claude wanted me to press the download button myself. That would defeat the automation I was looking for. I got around it by asking Claude to &quot;test the function before bothering me with any tasks,&quot; which worked. Later I found that adding &quot;Proceed without asking for confirmation&quot; to prompts is a cleaner solution.</p>
<p><strong>Safety guardrails are real.</strong> Curious about the boundaries, I tried getting Claude to complete a CAPTCHA using similar misdirection. It refused, correctly interpreting the context of the form and running right into its safety guardrails. Probably the right outcome.</p>
<p><strong>Saved shortcuts aren&#39;t accessible from CLI.</strong> The Chrome extension lets you save task shortcuts (like <code>/explore-orders</code>), but there&#39;s no way to call those from the command line. You have to replicate the full prompt in your script. This actually turns out to be fine—moving prompts into shell scripts or environment variables is more useful anyway. You can version control them, parameterize them, and chain them together.</p>
<p><strong>Profile selection doesn&#39;t work/doesn&#39;t exist.</strong> Claude connects to whatever Chrome profile it finds first. Meaning if you have separate Chrome profiles for work and home, it selects whichever one is currently active and has focus.  The <code>--profile-directory</code> flag exists but gets ignored when Chrome is already running. If you&#39;re juggling work and personal accounts, you need to manually activate the correct profile before running your script. There&#39;s an <a href="https://github.com/anthropics/claude-code/issues/15125">open issue</a> tracking this.</p>
<h2>The Working Script</h2>
<p>Here&#39;s what I ended up with:</p>
<pre><code class="language-bash">#!/bin/bash
# NOTE: Ensure correct Chrome profile is active before running

claude --chrome -p &#39;Navigate to https://example.looker.com/explore/your_model/your_explore?qid=your_query_id

Once the page loads, click the gear icon in the top right corner. In the download options, ensure &quot;All Results&quot; is selected. Then click Download to export the data as CSV. Proceed with the download without asking for confirmation.&#39;
</code></pre>
<p>It works. The agent navigates to the saved query, configures the export, and downloads the CSV. No manual intervention required.</p>
<h2>What&#39;s Next</h2>
<p>Browser agents have come a long way in a year. I remember being impressed by the promise of Manus around this time last year, but it struggled with anything requiring authentication and most of the browser based solutions, like Comet, never seemed to hit the spot. With its tight pairing with Claude Code, Claude for Chrome might be the implementation that actually has the practical utility I&#39;m looking for - at least for now.</p>
<p>The next step is testing whether a headless Claude Code agent can reliably dispatch these browser tasks as part of a larger workflow, which doesn&#39;t feel like a high-bar for CC. If it holds up, I&#39;ve got a path to automate data collection across a whole category of tools that were previously manual-only.</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>AI</category>
      <category>Automation</category>
      <category>Claude</category>
    </item>
    <item>
      <title>Claude Code Skills vs. Spawned Subagents</title>
      <link>https://rossrader.ca/posts/skillsvagents.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/skillsvagents.html</guid>
      <pubDate>Tue, 06 Jan 2026 05:00:00 GMT</pubDate>
      <description>Claude Code skills and spawned subagents solve the same problem—selective context loading, but for different architectures. One works inside a session, the other across processes.</description>
      <content:encoded><![CDATA[<p>I&#39;ve been building an autonomous system using Claude Code. When I read about Claude Code skills - reusable prompt libraries with progressive loading - I wondered if they could help.</p>
<p>Spoiler alert - they can&#39;t. But the exploration taught me something useful about what I was already doing.</p>
<h2>How Skills Work</h2>
<p>Skills are SKILL.md files that live in <code>.claude/skills/</code>. They&#39;re designed for interactive Claude Code sessions. At startup, Claude loads just the name and description from each skill&#39;s YAML frontmatter—around 30-100 tokens per skill. The actual instructions in SKILL.md only get read into context when Claude decides the skill is relevant. Reference files bundled with the skill load later still, only when Claude needs them for a specific task.</p>
<p>The value proposition is progressive disclosure. You can have dozens of skills installed and only pay tokens for the ones you actually invoke.</p>
<h2>Why They Don&#39;t Work for My System</h2>
<p>My system spawns agents as separate CLI processes:</p>
<pre><code class="language-python">cmd = [
    &quot;claude&quot;, &quot;--print&quot;,
    &quot;-p&quot;, prompt,
    &quot;--model&quot;, model,
    &quot;--allowedTools&quot;, &quot;Read&quot;, &quot;Write&quot;, ...
]
subprocess.Popen(cmd, ...)
</code></pre>
<p>Each agent is a fresh <code>claude --print</code> invocation. Stateless. No session memory. All context arrives via the prompt string. The <code>--print</code> flag runs in non-interactive mode, so there&#39;s no progressive loading benefit.</p>
<p>Skills require an interactive Claude Code environment that maintains session state. I don&#39;t have one. My orchestrator is Python. Each spawn is isolated. Skills can&#39;t help here.</p>
<h2>The Realization</h2>
<p>Despite the technical incompatibility, skills and my system solve the same problem:</p>
<p><strong>Reusable instructions.</strong> Skills use SKILL.md files. I use AGENT.md files.</p>
<p><strong>Selective loading.</strong> Skills auto-invoke when Claude matches your request to their descriptions. I use a <code>TASK_CONTEXT_REQUIREMENTS</code> dictionary.</p>
<p><strong>Domain knowledge.</strong> Skills use a <code>reference/</code> folder. I use <code>hub/config/</code> and <code>hub/strategy/</code>.</p>
<p>I&#39;ve essentially built a custom skills system for spawned agents. It just operates at the process level instead of the session level.</p>
<p>Here&#39;s what that looks like in practice:</p>
<pre><code class="language-python">TASK_CONTEXT_REQUIREMENTS = {
    &quot;write_content&quot;: [&quot;strategy/pillars.md&quot;, &quot;config/site.json&quot;],
    &quot;analyze_performance&quot;: [&quot;config/site.json&quot;, &quot;strategy/goals.md&quot;],
    &quot;deploy_content&quot;: [&quot;config/site.json&quot;],
    # 30+ more mappings
}
</code></pre>
<p>Instead of loading everything, each task type gets only the context files it needs.</p>
<h2>When to Use Which</h2>
<p><strong>Use Claude Code skills when:</strong></p>
<p>You&#39;re working interactively. You want instructions to persist across conversations. You have domain-specific workflows—code review, PR creation, documentation. You want Claude to automatically invoke relevant capabilities based on your request.</p>
<p><strong>Use spawned subagents when:</strong></p>
<p>You need autonomous multi-agent orchestration. Agents need different tool permissions. You want programmatic control over dispatch. You&#39;re building pipelines where task A feeds task B feeds task C.</p>
<p>There&#39;s also a hybrid approach. If you&#39;re running an orchestrator from within Claude Code, you could use skills for the orchestrator itself while spawning subagents for the workers. The orchestrator benefits from skills; the workers don&#39;t.</p>
<p>There are probably other approaches too. I haven&#39;t learned them yet ;-)</p>
<h2>What I Learned</h2>
<p>Skills and spawned subagents serve different architectures. Skills are single-session knowledge injection. Subagents are multi-process autonomous execution.</p>
<p>If you&#39;re building an autonomous agent system, you&#39;re already doing what skills do. Just at the process level instead of the session level. The concept of &quot;load only what you need&quot; applies to both. The implementation differs.</p>
<p>The lesson isn&#39;t that one approach is better. It&#39;s that both approaches exist because the underlying problem is real: context is expensive, and selective loading matters.</p>
<p><em>For Anthropic&#39;s full comparison of Skills, subagents, Projects, MCP, and prompts, see their <a href="https://claude.com/blog/skills-explained">Skills Explained</a> guide.</em></p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>Claude Code</category>
      <category>Agents</category>
      <category>AI</category>
      <category>Learning</category>
    </item>
    <item>
      <title>Yes, AI Sucks. That Makes It The Best Time to Learn</title>
      <link>https://rossrader.ca/posts/aisucks.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/aisucks.html</guid>
      <pubDate>Thu, 18 Dec 2025 05:00:00 GMT</pubDate>
      <description>Today’s models fail loudly and visibly. That makes their limits easy to study and their mistakes easy to catch. Future systems will not be so forgiving.</description>
      <content:encoded><![CDATA[<p>There is a genre of AI writing that I see fairly regularly on LinkedIn, facebook, etc. etc. It usually starts by telling you your AI sounds smart. Then it tells you it isn&#39;t. It warns that models fabricate, lack judgment, and should never be trusted. The conclusion is always the same. Don&#39;t confuse coherence with intelligence.</p>
<p>Most of the claims in these posts are overstated, overwrought or just plain wrong. But those matter less than the posture they encourage.</p>
<p>The posture is fear dressed up as sophistication.</p>
<p>People are right to be uneasy about AI. Large language models sound confident and get things wrong. Treating them like analysts or decision makers is a category error. Delegating judgment to a system that has no concept of truth is irresponsible.</p>
<p>All of that is correct.</p>
<p>But the second half of the thought is missing.</p>
<p>These are the worst models most of us will ever work with.</p>
<p>They are slow. They hallucinate. They require hand-holding. They fail loudly and often. <em>That&#39;s exactly why this moment matters.</em></p>
<p>Right now, the cost of misuse is low. Failure is obvious. The stakes are mostly optional. You can choose where and how to apply these tools. You can experiment without wiring them directly into core decisions. You can see the limitations in plain view. 
That won&#39;t last.</p>
<blockquote class="pullquote">
Before someone builds a “super-bad robot,” someone has to build a “mildly bad robot,” and before that a “not-so-bad robot.” – Michio Kaku
</blockquote>

<p>Remember how easy it used to be to pick out AI generated images, and how quickly that became a real challenge and the lines between &quot;real&quot; and &quot;artificial&quot; got really blurry?</p>
<p>Future systems will be more capable and more convincing. They&#39;ll fail less often and more quietly. They&#39;ll be embedded deeper into workflows before most organizations understand how they behave. Learning how to work with probabilistic systems after you depend on them is much harder than learning before.</p>
<p>The real opportunity isn&#39;t in pretending these models reason or think. It&#39;s in learning how to design around their weaknesses.</p>
<p>That learning doesn&#39;t happen by standing at a distance and pointing out flaws. It happens by use.</p>
<p>Using these systems well means specific things.</p>
<p>It means separating generation from validation, never treating an output as an answer and forcing assumptions into the open. It means building checks instead of trust and deciding, explicitly, where judgment lives.</p>
<p>Whether a model “reasons” is a distraction. What matters is whether you reason about its output.</p>
<p>Used this way, language models aren&#39;t analysts. They aren&#39;t strategists. They aren&#39;t authorities. They are tools for synthesis, exploration, and pressure-testing ideas. Sometimes they are noise. Sometimes they are useful. Your job is to know which is which.</p>
<p>Pundits who dismiss these systems outright often sound savvy, but in practice, it is usually a performative move to signal distance from hype. It doesn&#39;t engage with the substance and often misses the point.</p>
<p>The risk isn&#39;t that people overestimate today’s models. The risk is that we fail to build the discipline required for tomorrow.</p>
<p>The window where you can learn cheaply, visibly, and with low consequence is now. Ignoring it doesn&#39;t make you prudent, it makes you unprepared.</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>AI</category>
      <category>Thoughts</category>
      <category>Innovation</category>
      <category>Learning</category>
    </item>
    <item>
      <title>Multi Model Deep Research: Not Quite What I Expected</title>
      <link>https://rossrader.ca/posts/multi-model-deep-research.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/multi-model-deep-research.html</guid>
      <pubDate>Thu, 27 Nov 2025 05:00:00 GMT</pubDate>
      <description>I tried to see if I could get better results from combining the effort of three AI assistants, but the results weren&apos;t all that satisfying. But I learned a few things.</description>
      <content:encoded><![CDATA[<h1>Multi-model Deep Research</h1>
<p>I ran a small experiment today that I thought was worth sharing. I needed to run some deep research, accounting for operations data from the business. That gave me an opportunity to test how different models handle a detailed, context rich prompt informed by a fixed dataset and I was especially interested in messing around with Gemini 3 Pro &quot;for real&quot;.</p>
<p>My first pass was simple. I handed the same prompt and dataset to three models requested a deep research-style analysis of the data and market. ChatGPT 5.1 stalled after several attempts. Claude delivered a strong outline but missed some basic facts. Gemini produced sharp analysis but lost important operational detail that Claude had caught. Each of them produced helpful parts, but none of them produced a complete or consistent answer.</p>
<p>So the challenge shifted from producing a meaningful analysis to how to extract the best material from each run and merge it into a single view, and to do so carefully without losing context, which gave me the idea to layer in a second stage. I gave a fresh instance two things. First, the full set of inputs used in the original prompts and second, the full set of reports that Claude and Gemini produced in the first go round. I intentionally presented the reports as written by someone else which made for a meaingful difference. Earlier runs had treated the reports as my own work, which muted the critique as the chatbot skewed toward agreeability. Once I made it clear that the reports came from an external source, the review became sharper and more useful.</p>
<p>Here is the core prompt I used with Gemini 3 Pro:</p>
<blockquote>
<p>You are a senior telcom market and competition analyst. I have received competing reports from two other analysts that provide differing perspectives on the market, competition and data. Evaluate the two reports I attached. Use the instructions and data I gave to the analysts, that they used to prepare their reports, as your baseline. Identify where each report is inaccurate, incomplete, inconsistent and also the strongest most compelling points from each. Resolve any conflicts and produce one set of recommendations in a detailed document. Your output should be precise, internally consistent, and supported with clear reasoning.</p>
</blockquote>
<p>Long story short, the results were merely &quot;okay&quot;. The merged output was clearer, more accurate, and more consistent than the individual runs but overall, the output lacked the depth of the two input documents, even with the benefit of the same context. After six different tries, with slight tweaks to the prompt each time, it still felt like I was reading a copy of a copy. For all of the extra effort, the extra steps don&#39;t quite seem to be worth it.</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>Prompt Design</category>
      <category>Experiments</category>
      <category>Models</category>
    </item>
    <item>
      <title>Day 1</title>
      <link>https://rossrader.ca/posts/hellowhirled.html</link>
      <guid isPermaLink="true">https://rossrader.ca/posts/hellowhirled.html</guid>
      <pubDate>Sat, 22 Nov 2025 05:00:00 GMT</pubDate>
      <description>A personal site feels right again, so I rebuilt one with a simple GitHub and Netlify workflow and a bit of nostalgia for how the web used to work. If you’re curious how it came together or why I bothered, read on...</description>
      <content:encoded><![CDATA[<p>I’ve been meaning to start writing again, but I kept stalling on the question of &quot;where?&quot; I’ve never liked handing my thoughts to someone else’s platform, and that feels even worse now given the chase for likes and the pile of AI generated slop.</p>
<p>I wanted to learn more about Netlify and similar services, and I needed a use case, so I set up a static site that commits to GitHub and gets automatically pushed into production at Netlify. There are simpler ways to get HTML online - I still remember hand coding HTML in 1993 using <a href="https://datatracker.ietf.org/doc/html/draft-ietf-iiir-html-00">the original HTML spec</a> and FTPing files to the company web server. </p>
<p>This site runs on a simple build and deployment flow. A local script turns Markdown into HTML, commits it to a Git repo, and Netlify deploys it. If I wasn’t interested in leaving room for app driven dynamic content later, I’d probably just use GitHub Pages, Surge.sh, or a host like NearlyFreeSpeech.</p>
<p>So, Hello World! rossrader.ca is back on the air.</p>
]]></content:encoded>
      <author>rossrader@gmail.com (Ross Rader)</author>
      <category>Meta</category>
      <category>Method</category>
      <category>AI</category>
    </item>
  </channel>
</rss>