SEO Theories and Research
A lot of search engine optimization (SEO) sites talk about what what you should do to optimize your site with such conviction that you assume their theories are based on solid analysis. However, this isn’t always the case as there are many theories that don’t quite have enough supporting evidence to be considered fact while other theories turn out to be outright myths. This section covers some of these theories and myths along with actual empirical research.
10 Graphs Reveal Web Spam Patterns
They say a picture is worth a 1,000 words. So here's 10,000 words of web spam data from a research paper titled Detecting Spam Web Pages through Content Analysis by Alexandros Ntoulas et. al.
Detecting Cloaking Algorithmically Is Not Easy
Recently Matt Cutts posted a question on his blog asking for what people thought Google's Web Spam team should focus on next. Mixed in amongst the answers were requests to eliminate cloaking. Some even went so far as to list offending sites. What's interesting to me is that since cloaking isn't new, is there something tricky with its detection that has kept Google from eliminating it from their results?
Detecting Splogs with Self-Similarity Analysis
A while back Google's search results were littered with splogs (spam blogs). It was common to search for a term, click on the first result, and land on a page that had advertising fill the area above the fold and useless content below. I'm not sure when, but the problem must have reached critical mass because Google cleaned up the results. And as with most things Google, the cleanup was very likely algorithmic and automated. Ever wonder how they might have accomplished this feat? A research paper from May 8, 2007 titled Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics may be the answer.
Using Link Structures to Classify Web Spam
In an earlier post I summarized content from a research paper that provided a Web Spam Taxonomy. That paper is a few years old, but I believe it still provided a good foundation for discussions regarding web spam. In this post, I'm going to walk through a document titled Improving Web Spam Classifiers Using Link Structure written by Qingqing Gan and Torsten Suel of the CIS Department at Polytechnic University in Brooklyn, NY. In the world of information retrieval research, this research paper is quite current having been publishing in the May, 2007.
Web Spam Taxonomy
I came across an interesting research paper the other day titled Web Spam Taxonomy. How could I resist that title!? The paper was written by Zoltan Gyongyi and Hector Garcia-Molina while in the Computer Science Department of Stanford University. The authors also acknowledge many discussions with an anonymous collaborator at a major search engine as a source of information.
Is BrowseRank The New PageRank?
I've been on a research paper reading binge recently. I've got about 10 or so under my belt now in just the last couple of weeks. I've discovered they make great reading on my train ride to work. Relatively short and to the point. Sure they're often full of crazy math formulas, but those are easy to gloss over and instead concentrate on the discussion. Many of the papers were written years ago. Despite their age, the information is... ummm you know... informative. I mean that. My most recent reading, BrowseRank: Letting Web Users Vote for Page Importance, is actually from 2008 which makes it both informative and relevant to future SEO efforts.
Statistically Speaking, That Page Is Spam
In a previous post I covered a Microsoft Research paper that discussed how static factors could be used to improve search ranking results above and beyond what PageRank alone could do. Rolled together, these factors formed what the authors of the paper called fRank to measure the quality of a web page. In this post I'm going to cover another research paper that looks at the other end of the quality spectrum. That is, what can be done algorithmically to identify a given page or domain as spam? Note that the basis of this post is from a 2004 SIGIR Paper titled Spam, Damn Spam, and Statistics.
fRank Takes on PageRank
Where have I been? That's the question that my readers (both of you, not including my brother) may have asked in the last couple of months. I've been where I've always been, but I've been reading much more than I've been writing. Some of that reading has been research papers of the sort put out by the International World Wide Web Conference Committee or the Special Interest Group on Information Retrieval. Fancy names for fancy groups putting out fancy research papers.
Updating Links: An SEO Red Flag?
Following on the heels of Eric Lander's NoFollow: An SEO Red Flag?, I thought I'd pose the question of whether updating inbound links may also be a red flag.
In You, Google Trusts
As the debate goes on about the decreasing importance of PageRank, another measure continues to gain traction in the SEO world commonly called TrustRank. The idea behind TrustRank is that Google (and other search engines) assign a level of trust to a web site or maybe even a web site owner which in return can help with index inclusion and rankings.
The History of Latent Semantic Indexing
It's sometimes fun (well, if you're involved with SEO) to look at how theories sometimes form and seem to be truthful, but even years afterwards are still being discussed. Such is the case with Latent Semantic Indexing or LSI.




Entries (RSS)