Alexa - ia_archiver and the Wayback Machine
The Wayback Machine is used to provide a historical record of how a website looked it the past.
ia_archiver is the search engine identifier for the WaybackMachine
What Alexa say:
How to remove content from WaybackMachine's Archives
If you place the User-agent: ia_archiver in your robots.txt when Alexa spider your website it will not only not index your pages it will remove your pages from its existing index.
In order for your information to be removed from the Wayback archives your website must be on-line and the norobots.txt file accessible. If you have taken your website off-line your content will remain in the archive for as long as the Wayback service is in existence. This could be problematic if the website in question no longer exists but continues to be found by those making searches.
Copyright Material and WaybackMachine
This is problematic as you may be being hassled to remove copyrighted images or content from pages that have been archived by the Alexa archiver. As this is an automatic process Alexa are probably not even aware that this is the case. As such, I think that you are well within your rights to say that it is a matter that the copyright holder needs to take up with Alexa themselves. At the same time check to see that the offending material is no longer hosted by your website.
You can always remove ALL of your pages from the Wayback Machine by making the entry in your robots.txt file, but you may want some of the pages to remain. I can see no way of doing this selectively.
Personally I don't care if all of my pages have been removed from their archive. I did this because I was being hassled by someone who had found an entry that was in the WaybackMachine and wrongly assumed my identity.
The Wayback Machine is used to provide a historical record of how a website looked it the past.
The WayBack Machine from Alexa |
ia_archiver is the search engine identifier for the WaybackMachine
What Alexa say:
The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.
How the Wayback Machine tells you how to remove your content |
If you place the User-agent: ia_archiver in your robots.txt when Alexa spider your website it will not only not index your pages it will remove your pages from its existing index.
In order for your information to be removed from the Wayback archives your website must be on-line and the norobots.txt file accessible. If you have taken your website off-line your content will remain in the archive for as long as the Wayback service is in existence. This could be problematic if the website in question no longer exists but continues to be found by those making searches.
Copyright Material and WaybackMachine
This is problematic as you may be being hassled to remove copyrighted images or content from pages that have been archived by the Alexa archiver. As this is an automatic process Alexa are probably not even aware that this is the case. As such, I think that you are well within your rights to say that it is a matter that the copyright holder needs to take up with Alexa themselves. At the same time check to see that the offending material is no longer hosted by your website.
You can always remove ALL of your pages from the Wayback Machine by making the entry in your robots.txt file, but you may want some of the pages to remain. I can see no way of doing this selectively.
Personally I don't care if all of my pages have been removed from their archive. I did this because I was being hassled by someone who had found an entry that was in the WaybackMachine and wrongly assumed my identity.
No comments:
Post a Comment