Before You Start: Website Requirements
Shopify Stores
If you're on Shopify, you're golden. Spur connects directly and can crawl your store without any extra setup.
Custom Websites & Other Platforms
Got a custom website or using something other than Shopify? You might hit a roadblock first.
Common issue: Many custom websites have bot protection, firewalls, or crawl restrictions that block Spur from reading your content.
The fix: You'll need to temporarily lift these protections so we can crawl your site. Here's what to tell your developer:
Disable bot protection for Spur's crawlers temporarily
Whitelist our crawling IPs (we can provide these)
Remove any robots.txt restrictions that block automated crawling
Turn off aggressive firewalls during the crawling period
Once we've crawled everything, your developer can turn all the protections back on. Spur only needs access during the initial training and any retraining sessions.
Don't have a developer?
Some website builders (Wix, Squarespace, etc.) have these protections built in. Check your platform's settings for "bot protection," "crawler settings," or "SEO tools" - you might be able to adjust them yourself.
Setting Up Website Training
Step 1: Choose Your Root URL
This is crucial. Use your main homepage URL, not a specific product or category page.
Good examples:
yourstore.com
shop.yourbrand.com
www.yourbusiness.co.uk
Avoid these:
yourstore.com/products/bestsellers
yourstore.com/collections/winter
yourstore.com/pages/about
Why? Spur crawls from your root URL and follows all internal links. Starting from the homepage ensures we discover your entire site structure.
Step 2: Crawl vs Add Link
Use Crawl when:
You want your bot to know everything on your site
Your site has good internal linking between pages
You're training for the first time
You want comprehensive coverage
Use Add Link when:
Adding one specific page after initial crawling
You have a page that's not linked from anywhere else
Testing how a single page trains before crawling everything
Adding landing pages or hidden content
Step 3: Start the Crawling Process
Click Crawl and wait. You can close the browser or navigate away during crawling. The training will continue and will in most cases, not take more than 5 minutes.
Understanding Crawl Status
Each page shows one of these statuses:
Available ✅
Your bot knows this content and can answer questions about it. This is what you want to see.
Training ⏳
Still learning this page. Give it a few more minutes. If it stays here for over 15 minutes, something might be stuck.
Error ❌
Something went wrong. Common causes:
Page requires login: Bot can't access password-protected content
Broken links: The page returns a 404 or other error
Server issues: Your site was temporarily down during crawling
Fix for failed pages: Click the retry button, or use Add Link to try training that specific page again.
Advanced Scenarios & Troubleshooting
Password-Protected Content
Spur can't crawl pages behind logins, member restricted areas, or password protection. If you need this content trained:
Temporarily make those pages public during crawling
Copy the content and add it via the Text data source instead
Create a staging version of your site without password protection
Dynamic Content & JavaScript Sites
Sites built with React, Angular, or heavy JavaScript might not crawl perfectly. The bot sees what search engines see - if Google can't index your content well, neither can we.
Solutions:
Enable server-side rendering if possible
Use your sitemap to identify important pages that didn't crawl
Manually add key pages with Add Link
Copy important dynamic content to the Text data source
Multiple Languages
If your site has multiple language versions:
Crawl each language separately by starting from different root URLs
Example: Crawl
yoursite.com/en
andyoursite.com/es
as separate training sessionsThe bot will learn to respond in the appropriate language based on customer queries
Subdomain Issues
If your main site and blog/store are on different subdomains:
www.yourbrand.com
(main site)shop.yourbrand.com
(store)blog.yourbrand.com
(blog)
Crawl each subdomain separately. Spur's crawler treats these as different websites.
Maintaining Your Website Training
When to Retrain
Your bot won't automatically know about new content. Retrain when:
You add new products or services
You update pricing or policies
You launch new pages or sections
You notice the bot giving outdated information
Partial vs Full Retraining
Full retrain: Delete all current website data and crawl from scratch. Do this for major site overhauls.
Partial updates: Use Add Link for individual new pages. Faster and doesn't mess with existing training.
Content That Changes Frequently
For stuff that updates daily (like inventory counts, daily specials), don't rely on website crawling. Use the Text data source instead - it's easier to update quickly.
Quality Control Tips
Test Your Training
After crawling, ask your bot questions about:
Product details and specifications
Pricing and availability
Store policies and procedures
Contact information and hours
If it can't answer or gives wrong info, check if those pages trained properly.
Optimize Your Website Content
Make your site more bot-friendly:
Use clear headings and structure
Write in plain language, avoid jargon
Include FAQ sections
Make sure important info isn't buried in images or videos
Monitor Failed Pages
Keep an eye on pages that consistently fail to train. These might indicate:
Technical issues with your site
Content that needs to be restructured
Pages that should be excluded from training
Common Problems
"My bot doesn't know about X, but it's on my website"
Check if that page shows as Available in your data sources
The info might be in an image or video (bot only reads text)
Content might be loaded by JavaScript after the page loads
"Crawling is taking forever"
Check your web host isn't rate-limiting our crawlers
"Some pages show as Available but the bot gives wrong answers"
The bot might be pulling from an outdated cached version
Try retraining that specific page
Check if there's conflicting information across different pages
Need help with technical setup?
Most website issues can be resolved by temporarily adjusting your site's crawler restrictions. If you're stuck, contact your developer and walk through the requirements together.