From
http://forum.enjoysudoku.com/players-forum-future-proofing-t35870.html:
tarek wrote:The forum and the mirrored resources are important for many of us & the need for future proofing is always there. I hope that all moderators can still access the servers' data backup. I will from time to time try and take and a snap shot of the forum which I encourage others to do as well.
I have no idea if tarek still does this. Nor how he does it.
denis_berthier wrote:Sure, if anyone knows how to do this and if this is possible without having any admin password.
I guess it would also require a huge amount of disk space.
I don't think I'd be given a backup copy of the web site, even if I asked very very nicely!!!!!
The problem as a guest (not logged in) or registered (logged in) user is not to use excessive bandwidth resulting in being blocked or banned!
The registered Bots (google, bing, etc.) are also limited in what pages they can 'crawl' thru some configuration file on the web server side (the one being crawled).
I myself periodically (roughly about every 5 patterns games) do incremental copies of threads of interest to me (patterns game, hardest sudokus, etc.) - they number less than or about 5 threads only. My downloads amount to a few KBs/MBs each time. For a rough idea, my 'backup' of the hardest sudokus thread is 4.5MB. I might do a few more in future!
As mentioned before, I only grab the Print view, 1 file per thread page, without the images, scripts, etc. you would be downloading if viewing the page itself - the Print view is in a format I can process using scripts to extract stats and info.
There are a few open-sourced web crawlers, the ones I've tested before would copy/download everything recursively!! I've steered clear of these.