Generating Interactive 3D Cluster Plots of Vector Embeddings for SEO

Explore my Colab tool that clusters your blog URLs using OpenAI embeddings, reveals duplicate and off-topic content, and visualizes SEO insights in 3D.

|
Tags:
AI Search
SEO Tools
Generating Interactive 3D Cluster Plots of Vector Embeddings for SEO

I've been seeing a lot of content recently surrounding embeddings and semantic SEO. Earlier this week I saw this post from Cyrus Shepard and read Michael King's article on vector embeddings and felt particularly inspired. Even though I don't know much Python myself, I've been experimenting with integrating Python into my workflows using AI, and this felt like the perfect opportunity to do some testing.

So, I built my own semantic clustering tool in Google Colab!

What does this tool actually do?

My Colab notebook analyzes your blog content by grouping your URLs into meaningful clusters based on semantic similarity. It tells you exactly which topics your blog covers effectively, which articles are potentially off-topic, flags duplicate content, and recommends relevant internal linking opportunities to strengthen your topical authority.

How does it work?

This Colab notebook takes a CSV of your URLs and their OpenAI embeddings, then does a full semantic analysis of your content. It:

  • Groups URLs into topic clusters using UMAP for dimensionality reduction and KMeans for clustering
  • Calculates an authority score for each URL based on how closely it aligns with the overall topics you usually discuss
  • Flags off-topic content in the bottom 1% of authority scores
  • Detects duplicate content using cosine similarity.
  • Recommends internal linking opportunities for related pages using cosine similarity
  • Uses GPT-3.5 to generate human-readable topic labels for each cluster. These labels aren't always the best but I think it helps with understanding the content a bit.

The output is an interactive 3D scatter plot where each point represents a URL. Points are grouped and color-coded by cluster, with duplicates shown as orange diamonds and off-topic pages marked with red Xs. You can rotate, zoom, and hover to explore your site’s structure in a whole new way.

It also creates a CSV with more in-depth recommendations (potentially off-topic content, duplicate content, internal link opportunities, and more).

Here’s an example of the scatter plot for Soapy Joes Car Wash in San Diego:

Image

Here's the output for UHaul:

Image

And then here's a quick look at the Soapy Joes CSV file:

Image

How to use the tool yourself:

1. Generate an OpenAI API key. You're going to have to pay OpenAI some pennies to run the tool (sorry).

2. Generate embeddings in Screaming Frog

  • Choose your AI provider
    Go to Config > API Access > AI, then select OpenAI (You can use others but may need to edit my code in order to do so). Add your API key.
  • Load the embeddings prompt
    Go to Config > API Access > Prompt Configuration, click Add from Library, and choose Extract Semantic Embeddings from Page.
  • Connect to the API
    In the same API Access window, under Account Information, click Connect.
  • Enable HTML storage
    Go to Config > Spider > Extraction and check Store HTML and Store Rendered HTML.
  • Enable embeddings
    Go to Config > Content > Embeddings, check Enable Embedding Functionality, and confirm the correct prompt is selected.
  • Run the crawl

3. Make a copy of my Colab notebook

4. Enter your API key

5. Upload a CSV file containing your URLs and their OpenAI embeddings

6. Run the notebook, and it will do the rest. It'll organize your content, visualize clusters, and provide actionable recommendations in a CSV. It may take a bit the first time you run it.

7. Adjust this section and re-run if you are getting too little or too many potential duplicates:

Image

There might be some bugs (I'm not an expert at this) but you can usually resolve anything that comes up if you ask ChatGPT!

Send me an email if you have any questions!

NOTE: Screaming Frog just released their own version of this tool. Read this article if you're interested in testing theirs!