Offline Explorer Pro to download site from Internet Archive Wayback machine
|dr john leckenby||07/26/2011 04:17 pm|
|I have downloaded the following site from wayback (all original and backup files and drives were erased; I made this site and very much want to recover it. I am the owner and constructor of this site although I do not now own the domain name as I thought all was lost as of 2008 when the site was erased by others and let the name lapse.):
I have succeeded in downloading 203,770 files with 4.014 GB size.
1. When I try to view past the link from the fist page in Offline Explorer Pro browser, it launches IE (current version) and cannot find the files offline (these links are set to be opened in a new window in the html). I have checked the links, and they all reside on my harddrive but cannot be displayed. IE just shows 170.... and keeps churning away with no error message.
2. I have tried to follow the recommendations from Oleg to client Naomi in this forum of 12/5/2010 entitled "Please help me with re-constructing a site from Wayback Machine!!! garage-door-specialists.co.uk"
3. Here are the settings I have made for this download in Offline Explorer Pro:
(This site, http://www.ciadvertising.org, was downloaded to Internet Archive from 2001-2009 so there are many copies there)
checked load only within this server
unchecked Load files only from starting directory and below
nothing done here--used default values
setup rule to remove numbers and and unchecked to apply to files
(I did not do this quite correctly (will re-run) as the numbers were not replaced. Did the test on this rule and it works to remove numbers (dates of download on wayback machine) from files:
I greatly appreciate your help as I thought this site was lost forever and represents my life work as an academician (let alone my students' work). As you may know, the recommended download program by Internet Archive site no longer works with the changed wayback machine for downloading, and they indicate it will not work until after August 2011.
BTW, why when I attempt to open a .gif, e.g., from offline downloaded content in Photoshop, to verify it is on my haddrive, I get an message saying it cannot open the format?
|Oleg Chernavin||07/26/2011 04:20 pm|
|Can you please give me exact Project settings? Select it, use Export - Project Settings - Copy and then paste to the forum message.
I will do the download and try to see what is wrong.
Regarding GIF files. Yes, the site uses lots of redirects when you request a URL, it points you to another timed version of a file. So, many of the downloaded files are small HTML pages with redirections.
You may open them to see the exact location of the GIF and other such files.
|Tim||10/26/2011 09:25 am|
I am trying to do the same thing. Is there anyway to get it so when I export the files they go into one directory instead of all into to date stamped folders?
|Oleg Chernavin||10/26/2011 10:15 am|
|The best way is to use URL Substitutes (Properties - Parsing) to add rule:
Then redownload the project and export it.
|Tim||10/28/2011 08:27 am|
Thanks that worked. Only problems is I'm getting 1000's of files and pages from years that I don't want. One site is archived for 2007 and I have this in URL exceptions:
but it still downloads pages from older years than 2007.
Many thanks for your help.
|Oleg Chernavin||10/29/2011 02:59 pm|
|Can you post your settings here? Select the Project, press Ctrl+C on keyboard and paste it in the forum message.
|Tim||11/03/2011 04:07 pm|
Exported=28/10/2011 19:10:24 - D:\directory\domain\
|Oleg Chernavin||11/03/2011 04:09 pm|
|I see now. Please place these keywords to the URL Filters - Directory - Excluded keywords list.