How to train Spur's AI on your website

Before You Start: Website Requirements

Shopify Stores

If you're on Shopify, you're golden. Spur connects directly and can crawl your store without any extra setup.

Custom Websites & Other Platforms

Got a custom website or using something other than Shopify? You might hit a roadblock first.

Common issue: Many custom websites have bot protection, firewalls, or crawl restrictions that block Spur from reading your content.

The fix: You'll need to temporarily lift these protections so we can crawl your site. Here's what to tell your developer:

Disable bot protection for Spur's crawlers temporarily
Whitelist our crawling IPs (we can provide these)
Remove any robots.txt restrictions that block automated crawling
Turn off aggressive firewalls during the crawling period

Once we've crawled everything, your developer can turn all the protections back on. Spur only needs access during the initial training and any retraining sessions.

Don't have a developer?

Some website builders (Wix, Squarespace, etc.) have these protections built in. Check your platform's settings for "bot protection," "crawler settings," or "SEO tools" - you might be able to adjust them yourself.

Setting Up Website Training

Step 1: Choose Your Root URL

This is crucial. Use your main homepage URL, not a specific product or category page.

Good examples:

yourstore.com
shop.yourbrand.com
www.yourbusiness.co.uk

Avoid these:

yourstore.com/products/bestsellers
yourstore.com/collections/winter
yourstore.com/pages/about

Why? Spur crawls from your root URL and follows all internal links. Starting from the homepage ensures we discover your entire site structure.

Step 2: Crawl vs Add Link

Use Crawl when:

You want your bot to know everything on your site
Your site has good internal linking between pages
You're training for the first time
You want comprehensive coverage

Use Add Link when:

Adding one specific page after initial crawling
You have a page that's not linked from anywhere else
Testing how a single page trains before crawling everything
Adding landing pages or hidden content

Step 3: Start the Crawling Process

Click Crawl and wait. You can close the browser or navigate away during crawling. The training will continue and will in most cases, not take more than 5 minutes.

Understanding Crawl Status

Each page shows one of these statuses:

Available ✅

Your bot knows this content and can answer questions about it. This is what you want to see.

Training ⏳

Still learning this page. Give it a few more minutes. If it stays here for over 15 minutes, something might be stuck.

Error ❌

Something went wrong. Common causes:

Page requires login: Bot can't access password-protected content
Broken links: The page returns a 404 or other error
Server issues: Your site was temporarily down during crawling

Fix for failed pages: Click the retry button, or use Add Link to try training that specific page again.

Advanced Scenarios & Troubleshooting

Password-Protected Content

Spur can't crawl pages behind logins, member restricted areas, or password protection. If you need this content trained:

Temporarily make those pages public during crawling
Copy the content and add it via the Text data source instead
Create a staging version of your site without password protection

Dynamic Content & JavaScript Sites

Sites built with React, Angular, or heavy JavaScript might not crawl perfectly. The bot sees what search engines see - if Google can't index your content well, neither can we.

Solutions:

Enable server-side rendering if possible
Use your sitemap to identify important pages that didn't crawl
Manually add key pages with Add Link
Copy important dynamic content to the Text data source

Multiple Languages

If your site has multiple language versions:

Crawl each language separately by starting from different root URLs
Example: Crawl yoursite.com/en and yoursite.com/es as separate training sessions
The bot will learn to respond in the appropriate language based on customer queries

Subdomain Issues

If your main site and blog/store are on different subdomains:

www.yourbrand.com (main site)
shop.yourbrand.com (store)
blog.yourbrand.com (blog)

Crawl each subdomain separately. Spur's crawler treats these as different websites.

Maintaining Your Website Training

When to Retrain

Your bot won't automatically know about new content. Retrain when:

You add new products or services
You update pricing or policies
You launch new pages or sections
You notice the bot giving outdated information

Partial vs Full Retraining

Full retrain: Delete all current website data and crawl from scratch. Do this for major site overhauls.

Partial updates: Use Add Link for individual new pages. Faster and doesn't mess with existing training.

Content That Changes Frequently

For stuff that updates daily (like inventory counts, daily specials), don't rely on website crawling. Use the Text data source instead - it's easier to update quickly.

Quality Control Tips

Test Your Training

After crawling, ask your bot questions about:

Product details and specifications
Pricing and availability
Store policies and procedures
Contact information and hours

If it can't answer or gives wrong info, check if those pages trained properly.

Optimize Your Website Content

Make your site more bot-friendly:

Use clear headings and structure
Write in plain language, avoid jargon
Include FAQ sections
Make sure important info isn't buried in images or videos

Monitor Failed Pages

Keep an eye on pages that consistently fail to train. These might indicate:

Technical issues with your site
Content that needs to be restructured
Pages that should be excluded from training

Common Problems

"My bot doesn't know about X, but it's on my website"

Check if that page shows as Available in your data sources
The info might be in an image or video (bot only reads text)
Content might be loaded by JavaScript after the page loads

"Crawling is taking forever"

Check your web host isn't rate-limiting our crawlers

"Some pages show as Available but the bot gives wrong answers"

The bot might be pulling from an outdated cached version
Try retraining that specific page
Check if there's conflicting information across different pages

Need help with technical setup?

Most website issues can be resolved by temporarily adjusting your site's crawler restrictions. If you're stuck, contact your developer and walk through the requirements together.

AI Rate Limiting - Protect Your Credits from Overuse

How to train your AI Agent on Files, Text and Q&A Data

General Settings of your AI Agent

Connecting Your AI Agent to WhatsApp/IG/FB/Live Chat