Please help me with re-constructing a site from Wayback Machine!!! garage-door-specialists.co.uk

Author Message
Naomi 12/05/2010 07:59 am
Hi All,

We are currently looking for software that can ASSIST (we know its not going to be perfect) with re-constructing around 120 sites from the Web Archive. I am looking at Offline Explorer and trying to get this url back as an initial test: www.garage-door-specialists.co.uk. Is there anyone here that can give us a step by step guide on how to do it?

At the moment i am getting several folders with not much in them.

Also, which version of the software will we need to only do this type of work?

Any help will be greatly appreciated!

naomi
Oleg Chernavin 12/05/2010 08:17 am
Naomi,

Take the URL of the site. For example:

http://web.archive.org/web/20080623044405/www.garage-door-specialists.co.uk/

In Offline Explorer Pro click the New Project button. Enter this URL in the URLs field. Set Level to 10.

In URL Filters - Directory add the Included keyword:

www.garage-door-specialists.co.uk

Uncheck the "Load only from the starting directory" keyword.

If you want to reconstruct the site, go to the Parsing section and click the URL Substitutes button. Add the rule:

URL:
http://web.archive.org/web/*www.garage-door-specialists.co.uk/
Replace:
http://web.archive.org/web/*/
With:
Keep this field empty.

Uncheck the rule you just added, so it is applied to filenames, not URLs. Click OK to save the rule and OK to save Project. Start downloading it.

Please let me know how it works.

Best regards,
Oleg Chernavin
MP Staff
Naomi 12/05/2010 08:52 am
really apprciate your help. I am running this now but it seems to be following external links. Can i stop that?
Naomi 12/05/2010 08:54 am
Also, do i not need to do anything in Link Translation section? bare in mind that i want to re-publish the site?
Oleg Chernavin 12/05/2010 09:17 am
Use Online Links translation for that. What external links get downloaded? Perhaps, you should check the URL Filters - Server - Load only from the starting server box.

Oleg.
Naomi 12/05/2010 09:48 am
well, what it seems to be doing is following links like Conservatory Blonds at the bottom of the page and its then downloading that site as well. Its doing it for all of the links so im getting a copy of lots of sites. Is there any way of saying DO NOT FOLLOW ANY LINKS NOT ON THE PROJECT DOMAIN? Tried to tick that Load files only within starting server but its still doing it.
Oleg Chernavin 12/05/2010 11:13 am
Did you follow the above advice:

In URL Filters - Directory add the Included keyword:

www.garage-door-specialists.co.uk

Uncheck the "Load only from the starting directory" keyword.

??

Oleg.
Naomi 12/05/2010 11:30 am
I deleted the project and started again and it now only downloads from the correct domain. However its seems to be downloading many more files then the site actually has. Is it downloading from every date variation?

also, is there a way to tell it to download everything into the correct folders or do i just have to go into each one and move it manually?
Oleg Chernavin 12/05/2010 12:03 pm
It simply follows the links on the site. You may probably notice that the site changes the dates all the time you are browsing. For example, go to the page:

http://web.archive.org/web/20080623044405/www.garage-door-specialists.co.uk/

Its date is 20080623044405. Click the Installation & Service link in the left menu. You will be directed to:

http://web.archive.org/web/20070125222011/www.garage-door-specialists.co.uk/garage-door-installation.php

As you see the date is different. There is some internal logic of this site.

Oleg.
Oleg Chernavin 12/05/2010 12:04 pm
If you use the URL Substitutes rule (described above), it will place everything in the correct folders and will remove the dates, etc.

Oleg.
Naomi 12/08/2010 05:44 am
Hi Again,

I ahve had a chance to have another go but now when i export the files from another site, and then export the files to a local directory, we get things like:

cnclp3.php@id=1
cnclp3.php@id=2
cnclp3.php@id=3

What should the filenames be for those type of files? How do i make the system export them correctly?

Also, i seem to be getting all o fht evariations of a certain file such as:
cnclp_logo6.jpg
cnclp_logo7.jpg
cnclp_logo8.jpg

Is it possible to tell it to just keep one anywhere? Or do i need to manually delete them?

Any suggetions?

regards,
Naomi

Oleg Chernavin 12/09/2010 02:34 pm
I made some changes in Offline Explorer. I need you to test them. If you find these improvements useful, I will add them as a feature to the next version.

Oleg.
leeuniverse 12/27/2010 07:50 pm
Hey, glad this has been brought up again... I never was able to download that site (restorationhistory.com) at the Web Archive "properly", without having a million files, folders, and copies of the same pic and/or files. I will try also soon again with the new version per these new directions. Hopefully we can finally get a good download method going.

Thanks much Oleg...
leeuniverse 12/27/2010 08:23 pm
Hey Oleg.... is this the version with your "changes"???

11/16/2010 - Offline Explorer Enterprise 5.9 .3284 Service Release 2

Naomi 12/28/2010 06:00 am
Hi, the version is now getting much better. With only a few more small steps, this could be the first software that retreives stuff from the Way Back machine with much reduces hassle. Just waiting for some minor changes to be made and see if i can get on with my work. Thanks Oleg. Looking forward to seeing the new version.