Latent Semantic Indexing (LSI) Simplified

A Computer ReadingLatent semantic indexing, it’s one of the latest ingredients to the algorithm of today’s most progressive search engines. It makes the notion of rating search results based on keyword density look childish and results in better search engine results, but more importantly, makes them harder to SPAM.

But what the heck is it?

For those of us who aren’t computer science majors or mathematical geniuses; this article will explain latent semantic indexing in plain English (or the closest thing to it). By understanding this new method of information retrieval, you can see why it will continue to be a major influence on what we find online.

People Smarter Than Me

In 1988, Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter, developed the latent semantic analysis technique. When applied to information retrieval its called latent semantic indexing and it sets out to solve the problem of searching text containing synonyms and polysemy, among others.

  • Synonyms

    Two words that have similar or identical meaning.

  • Example:

    That girl is a student of geometry.
    That girl is a pupil of geometry.

  • Polysemy

    When two words can have the same meaning.


    There was a mole in the organization.
    I have a problem with a mole burrowing in my backyard.

Who’s Superior… Man or Machine?

Unlike humans, who can quickly understand these nuances in language, artificial intelligence can’t. Instead, smart people develop equations that most humans can’t quickly understand to help. Ironic.

Latent semantic indexing is much like semantic networking. Semantic networks are nothing more than word associations. Each are unique to a given person and are used to help us determine relationships between two different items.

Here’s one I made focused around the word “sleep”

latent semantic indexing map

Notice how “bear”, “human”, and “computer” are each related to the root word “sleep”. As you move further away from the root, you start to see each item become more different, like “cyber café”. Anyone who has been to one knows that sleep is the furthest thing from the minds of its patrons.

How Latent Semantic Indexing Relates to Search Engine Ranking

Assume that we just used Trellian’s Keyword Discovery or Wordtracker and found that there are a lot of searches for, “cyber café geeks”. It wouldn’t be surprising to see us write an article with the same title. According to conventional SEO, we would make sure the phrase “cyber café geeks” makes up 2% - 10% of the article.

But what if we wanted to take it a step further by using latent semantic indexing? We would need to determine the semantic network of Google, Yahoo!, or MSN right? But how?

Easy. We buy software that does this for us.

This is where the service OptiRanker comes in. It allows you to enter a keyword phrase and returns the semantic network for that phrase in a respective search engine. By creating a descending list of words ordered by relevance, we can determine which words should be included with our article to increase its relevance. The goal being, obviously, to increase our overall search engine position.

To provide an extremely simplified example, let’s look at two different sentences. According to latent semantic indexing, and using the figure above, which one would appear #1 in the search engine results for a search for “cyber café geeks”?


1. According to a recent survey of cyber cafés, geeks typically use them for their high-speed Internet connection.

2. Cyber cafés are home to geeks across the world, not because their computers are faster, but because their coffee is better.

Aside from the ridiculous notion that geeks don’t already have a blazing connection to the Internet at home, can you tell which sentence would gain position #1? If you said #2, you’d be correct. Although sentence #1 contains the keyword phrase we’re optimizing for, sentence #2 contains more words found in the semantic network, making it more relevant.

Hopefully this helps clarify how latent semantic indexing works. It’s not the only thing considered when a search engine returns its results. Linking, keyword density, credibility, ad nauseam all still play a role.

I wonder what the next step in search technology will be… precognition? Don’t laugh… it’s not that far fetched.

  • Help Me Improve!

  • The only way for my posts to get better is to get your opinion! So help me out, and rate this one! Thanks!
    1 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 5
    Avg. Rating: 5 - Total Votes: 1
    Loading ... Loading ...

4 Responses to “Latent Semantic Indexing (LSI) Simplified”

  1. cherubin says:

    Thank you, thank you, thank you. I have been trying to understand this process for over two years now, and with one simple blog…………
    Now I get it!!!

  2. Brian says:

    If only everyone was as enthusiastic about my blog as you are Chris. ;-)

  3. deeny says:

    This is so cool. It is simply mapping out your sentence structure. Almost like a family tree. Thanks for simplifying one of the most misunderstood computer concepts.

  4. Dr. E. Garcia says:

    I disagree. I highly question that Optiranker and similar tools actually use SVD or LSI. I also question similar tools that claim to compute LSI scores as a current search engine would. Many SEOs are notorious for spreading such snakeoil marketing and LSI myths.

    Dr. E. Garcia

Leave a Comment

Creative Commons License

This work is licensed under a
Creative Commons Attribution 2.5 License.
28 queries. 0.967 seconds.