- BrianShoff.com - https://brianshoff.com -

Latent Semantic Indexing (LSI) Simplified

Posted By Brian On January 25, 2007 @ 11:13 pm In Tech | 4 Comments

A Computer ReadingLatent semantic indexing, it’s one of the latest ingredients to the algorithm of today’s most progressive search engines. It makes the notion of rating search results based on keyword density look childish and results in better search engine results, but more importantly, makes them harder to SPAM.

But what the heck is it?

For those of us who aren’t computer science majors or mathematical geniuses; this article will explain latent semantic indexing in plain English (or the closest thing to it). By understanding this new method of information retrieval, you can see why it will continue to be a major influence on what we find online.


People Smarter Than Me

[2] latent semantic analysis technique. When applied to information retrieval its called latent semantic indexing and it sets out to solve the problem of searching text containing synonyms and polysemy, among others.

  • Synonyms

    Two words that have similar or identical meaning.

  • Example:

    That girl is a student of geometry.
    That girl is a pupil of geometry.

  • Polysemy

    When two words can have the same meaning.


    There was a mole in the organization.
    I have a problem with a mole burrowing in my backyard.

Who’s Superior… Man or Machine?

Unlike humans, who can quickly understand these nuances in language, artificial intelligence can’t. Instead, smart people develop equations that most humans can’t quickly understand to help. Ironic.

Latent semantic indexing is much like [3] semantic networking. Semantic networks are nothing more than word associations. Each are unique to a given person and are used to help us determine relationships between two different items.

Here’s one I made focused around the word “sleep”

latent semantic indexing map

Notice how “bear”, “human”, and “computer” are each related to the root word “sleep”. As you move further away from the root, you start to see each item become more different, like “cyber café”. Anyone who has been to one knows that sleep is the furthest thing from the minds of its patrons.

How Latent Semantic Indexing Relates to Search Engine Ranking

Assume that we just used [4] Wordtracker and found that there are a lot of searches for, “cyber café geeks”. It wouldn’t be surprising to see us write an article with the same title. According to conventional SEO, we would make sure the phrase “cyber café geeks” makes up 2% - 10% of the article.

But what if we wanted to take it a step further by using latent semantic indexing? We would need to determine the semantic network of Google, Yahoo!, or MSN right? But how?

Easy. We buy software that does this for us.

This is where the service [5] OptiRanker comes in. It allows you to enter a keyword phrase and returns the semantic network for that phrase in a respective search engine. By creating a descending list of words ordered by relevance, we can determine which words should be included with our article to increase its relevance. The goal being, obviously, to increase our overall search engine position.

To provide an extremely simplified example, let’s look at two different sentences. According to latent semantic indexing, and using the figure above, which one would appear #1 in the search engine results for a search for “cyber café geeks”?


1. According to a recent survey of cyber cafés, geeks typically use them for their high-speed Internet connection.

2. Cyber cafés are home to geeks across the world, not because their computers are faster, but because their coffee is better.

Aside from the ridiculous notion that geeks don’t already have a blazing connection to the Internet at home, can you tell which sentence would gain position #1? If you said #2, you’d be correct. Although sentence #1 contains the keyword phrase we’re optimizing for, sentence #2 contains more words found in the semantic network, making it more relevant.

Hopefully this helps clarify how latent semantic indexing works. It’s not the only thing considered when a search engine returns its results. Linking, keyword density, credibility, ad nauseam all still play a role.

I wonder what the next step in search technology will be… precognition? Don’t laugh… it’s not that far fetched.


Article printed from BrianShoff.com: https://brianshoff.com

URL to article: https://brianshoff.com/tech/latent-semantic-indexing-lsi-simplified.htm

URLs in this post:
[1] Image: http://www.optiranker.com
[2] latent semantic analysis: http://en.wikipedia.org/wiki/Latent_semantic_analysis
[3] semantic networking: http://en.wikipedia.org/wiki/Semantic_networks
[4] Wordtracker: http://our.affiliatetracking.net/wordtracker/a/15349
[5] Image: http://www.optiranker.com
[6] Image: http://www.optiranker.com

Click here to print.