Subsequent downloads after URL substitutes and skipping existing files

Author Message
Tony 09/26/2005 06:24 am

I am using URL substitutes in my projects.

As the website I download from updates, I have to make subsequent downloads and need to use the "skip existing files on levels higher than" option.

I notice that when I download the second time, oe pro looks for the urls I have created using URL substitutes. Nothing is dowmloaded of course, but I note from the log that `404 not found` are instead downloaded (and it would seem discarded).

The problem is I am hitting the target server and causing all these 404 messages, using bandwidth and no doubt causing whoever is using the target server to wonder what is going on.

Is there a way to prevent this please?

Many thanks,

Oleg Chernavin 09/26/2005 07:46 am
You have to redownload the site again, because all links in downloaded Web pages were corrected according to the URL Substitutes rules, so Offline Explorer is not aware how to restore what was before that.

Best regards,
Oleg Chernavin
MP Staff
Tony 09/26/2005 09:45 pm
Hello Oleg,

This is a problem because I actually extract data and put it into a database. If I completely redownload the website it means the first download`s data will be doubled up, along with taking much more time.

Is there no way around this? Or are there plans to do something about this in a future version (hopefully due out soon?!)

Many thanks,

Oleg Chernavin 09/27/2005 04:07 am
Please try to add the following line to the URLs field of the Project:


I am not 100% sure that it will help, but this will keep original (not parsed) copies of downloaded HTML pages. They will be used while updating the site.

Tony 09/27/2005 06:00 am
Hi Oleg,

I am changing urls for jpg files. Unfortunately with additional=primary i just get the rewritten url with .primary on the end.

Oleg Chernavin 09/27/2005 06:39 am
And redownload requests correct .jpg files or substituted ones?

Tony 09/27/2005 08:14 am
All it does is repeat the export and processing of urls already downloaded, hence doubling up the data in the database unfortunately.
Oleg Chernavin 09/27/2005 09:05 am
I see. I am sorry, I have no other solution right now.

Tony 09/27/2005 10:50 am
OK - I can only think to scrap the url subs for now and use batch files to move the jpg files as I want. Not quite the same but might come up with a work around.

Nice if a better solution can be found though :-)

Oleg Chernavin 09/28/2005 02:03 am
I agree, but I have no ideas on that now. Sorry.