Some duplicate content is created by negligence, such as linking to /page, /page/ and /page/index.html, which is one and the same document, if the web server is configured to do indexing and redirects. Other dupes are created by scrapers wanting to re-use interesting content. If you're concerned or affected here's help with on site duplicates and those created by scrapers.
Google officially asks users again for reports about paid links and other forms of spam.
Udi Manber details some of the issues Coders have to tackle when trying to improve search result quality and ranking. Obviously he doesn't share any secrets, but it would help some people to understand the process, to see it more than a sequence of HTML tags paired with purchased links and blog spam.
Tim Bray states, what must be, if you're past your teens, the obvious. Namely that Facebook, and by implication and according to the comments social network sites in general are just a waste of time. Who would have thought?
Google's Webmaster Central Blog clarifies the use or uselessness of various Meta Tags, explaining which are and always have been superfluous, and the few that still can be helpful. I don't know if unavailable_after:[date]: is obeyed - I bet it is treated like 404s: as long as someone links to an address Google believes it's proper to consider a page for inclusion in its results irrespective whether it still exists.
One more place to stuff lots of instructions addressed to spiders comes from Yahoo, whose spiders now obey a set of newly introduced HTTP X-Headers. The one somewhat useful application I can see is the exclusion of specific file types such as PDFs by adding an HTTP Header for those site administrators who sprinkle Files all over the place instead of organising them according to need and purpose.
Google's Online Security Blog asks for Help in finding "malware". Google could do a lot [more] to help themselves by treating Redirects differently. More so by proper page analysis.
If you did look at one of the pages purged last week because of the malware infestation it could cause on unsuspecting and unprotected users, it's obvious that the page itself contains no more than utter rubbish. No one speaks or writes like that. And if they do, they're either locked up. And I'm sure Google of all bodies will have the ability to discern what is expressed meaning and what constitutes just a list of words, from a large list of totally unrelated subjects.
Google's Webmaster pages now state that selling links [and being caught] doesn't just reduce your Pagerank, it may also "negatively impact a site's ranking in search results".
Now that openly sold links have been demoted Google goes after paid bloggers who write about paying firms or their products in order to link their sites if you believe PayPerPost's T. Murphy.
Links can be classed into two groups: real ones and fake ones, those where money or some other form of consideration changes hands. Strangely most commercial sites try to get fake ones and then to hide their fakeness in such a way that the search engines [software] quality control regards them as real links. And it's this mindset that makes these sources worthless. It's not how many people you can pay to say you're cool. It's how cool you are that matters.
When a site is finished seems to be governed by how cool it looks these days. Functionality or reliability are add-ons to be implemented at a later date, it appears. This is probably why I find more and more pages advertising the fact that a product is unavailable.
Returning these almost always very elaborate pages with a 404 status code wouldn't cost a penny more, but it'd save a lot of people a lot of time, as search engines would be prevented from indexing pages saying, for want of a better phrase, that you drew a blank.
The same applies to ajaxified or web 2.0 sites - that's sites requiring Javascript in the user's browser for you and me. For this very reason Google posted some common sense advice in their Webmaster Central Blog for those developers who never bothered to look at the HTTP protocol. Using the term "Ajax" in the body copy however the thing is almost promoted to inside knowledge.
How useless meta tags have become should be apparent when you search at Google for the term produltproben [it's a typo] and check the 50 or so results Google provides. You won't find the well known and well linked site [geizkragen.de] where I found the word in its Meta Keywords, although you will find pages that have what appears to be a generous copy of geizkragen.de's meta strings in their page body.
Yahoo, btw, still regards Meta Keywords as valid information.
© Copyright 1998 - 2008 Klaus Schallhorn.