Skip to main content

How to train Spur's AI on your website

Your website is where your AI agent gets 90% of its knowledge about your products, services, and brand. This guide walks through everything you need to know to get it trained properly.

Updated this week

Before You Start: Website Requirements

Shopify Stores

If you're on Shopify, you're golden. Spur connects directly and can crawl your store without any extra setup.

Custom Websites & Other Platforms

Got a custom website or using something other than Shopify? You might hit a roadblock first.

Common issue: Many custom websites have bot protection, firewalls, or crawl restrictions that block Spur from reading your content.

The fix: You'll need to temporarily lift these protections so we can crawl your site. Here's what to tell your developer:

  1. Disable bot protection for Spur's crawlers temporarily

  2. Whitelist our crawling IPs (we can provide these)

  3. Remove any robots.txt restrictions that block automated crawling

  4. Turn off aggressive firewalls during the crawling period

Once we've crawled everything, your developer can turn all the protections back on. Spur only needs access during the initial training and any retraining sessions.

Don't have a developer?

Some website builders (Wix, Squarespace, etc.) have these protections built in. Check your platform's settings for "bot protection," "crawler settings," or "SEO tools" - you might be able to adjust them yourself.

Setting Up Website Training

Step 1: Choose Your Root URL

This is crucial. Use your main homepage URL, not a specific product or category page.

Good examples:

  • yourstore.com

  • shop.yourbrand.com

  • www.yourbusiness.co.uk

Avoid these:

  • yourstore.com/products/bestsellers

  • yourstore.com/collections/winter

  • yourstore.com/pages/about

Why? Spur crawls from your root URL and follows all internal links. Starting from the homepage ensures we discover your entire site structure.

Step 2: Crawl vs Add Link

Use Crawl when:

  • You want your bot to know everything on your site

  • Your site has good internal linking between pages

  • You're training for the first time

  • You want comprehensive coverage

Use Add Link when:

  • Adding one specific page after initial crawling

  • You have a page that's not linked from anywhere else

  • Testing how a single page trains before crawling everything

  • Adding landing pages or hidden content

Step 3: Start the Crawling Process

Click Crawl and wait. You can close the browser or navigate away during crawling. The training will continue and will in most cases, not take more than 5 minutes.

Understanding Crawl Status

Each page shows one of these statuses:

Available ✅

Your bot knows this content and can answer questions about it. This is what you want to see.

Training ⏳

Still learning this page. Give it a few more minutes. If it stays here for over 15 minutes, something might be stuck.

Error ❌

Something went wrong. Common causes:

  • Page requires login: Bot can't access password-protected content

  • Broken links: The page returns a 404 or other error

  • Server issues: Your site was temporarily down during crawling

Fix for failed pages: Click the retry button, or use Add Link to try training that specific page again.

Advanced Scenarios & Troubleshooting

Password-Protected Content

Spur can't crawl pages behind logins, member restricted areas, or password protection. If you need this content trained:

  1. Temporarily make those pages public during crawling

  2. Copy the content and add it via the Text data source instead

  3. Create a staging version of your site without password protection

Dynamic Content & JavaScript Sites

Sites built with React, Angular, or heavy JavaScript might not crawl perfectly. The bot sees what search engines see - if Google can't index your content well, neither can we.

Solutions:

  • Enable server-side rendering if possible

  • Use your sitemap to identify important pages that didn't crawl

  • Manually add key pages with Add Link

  • Copy important dynamic content to the Text data source

Multiple Languages

If your site has multiple language versions:

  • Crawl each language separately by starting from different root URLs

  • Example: Crawl yoursite.com/en and yoursite.com/es as separate training sessions

  • The bot will learn to respond in the appropriate language based on customer queries

Subdomain Issues

If your main site and blog/store are on different subdomains:

  • www.yourbrand.com (main site)

  • shop.yourbrand.com (store)

  • blog.yourbrand.com (blog)

Crawl each subdomain separately. Spur's crawler treats these as different websites.

Maintaining Your Website Training

When to Retrain

Your bot won't automatically know about new content. Retrain when:

  • You add new products or services

  • You update pricing or policies

  • You launch new pages or sections

  • You notice the bot giving outdated information

Partial vs Full Retraining

Full retrain: Delete all current website data and crawl from scratch. Do this for major site overhauls.

Partial updates: Use Add Link for individual new pages. Faster and doesn't mess with existing training.

Content That Changes Frequently

For stuff that updates daily (like inventory counts, daily specials), don't rely on website crawling. Use the Text data source instead - it's easier to update quickly.

Quality Control Tips

Test Your Training

After crawling, ask your bot questions about:

  • Product details and specifications

  • Pricing and availability

  • Store policies and procedures

  • Contact information and hours

If it can't answer or gives wrong info, check if those pages trained properly.

Optimize Your Website Content

Make your site more bot-friendly:

  • Use clear headings and structure

  • Write in plain language, avoid jargon

  • Include FAQ sections

  • Make sure important info isn't buried in images or videos

Monitor Failed Pages

Keep an eye on pages that consistently fail to train. These might indicate:

  • Technical issues with your site

  • Content that needs to be restructured

  • Pages that should be excluded from training

Common Problems

"My bot doesn't know about X, but it's on my website"

  • Check if that page shows as Available in your data sources

  • The info might be in an image or video (bot only reads text)

  • Content might be loaded by JavaScript after the page loads

"Crawling is taking forever"

  • Check your web host isn't rate-limiting our crawlers

"Some pages show as Available but the bot gives wrong answers"

  • The bot might be pulling from an outdated cached version

  • Try retraining that specific page

  • Check if there's conflicting information across different pages


Need help with technical setup?

Most website issues can be resolved by temporarily adjusting your site's crawler restrictions. If you're stuck, contact your developer and walk through the requirements together.

Did this answer your question?