Excluding existing local files from download

Author Message
Gerald 10/24/2003 08:05 am
I am currently evaluating Outlook Explorer Pro for mirroring several technical web sites to be placed on a dedicated internal LAN. Is it possible to prevent OEP from downloading files from a site that are already stored locally (i.e. add the existing files to the project)?

The problem I am running into is that one of my sites contains over 2 GB of PDF documents. These have been previously downloaded using another software package. I would like to move these files into the new OEP project and process them as part of the OEP project, only downloading new or updated files.

I could move the files (stored in the site`s structure) into the OEP directory, exclude this directory from the project, and translate all links to relative locations (i.e. old_software_path/www.website.com/pdffiles/*.pdf -> new_OEP_location/www.website.com/pdffiles/*.pdf). This has worked. However, including this online location causes OEP to redownload every file since it isn`t pre-existing in the Map.

Is there an option to check local file size with that stored at the site and add NOT redownload files that are located locally, but not currently in the Map? I can see a feature such as this useful in many other circumstances, as well. What do you think?

Thanks a million,
Gerald
Oleg Chernavin 10/25/2003 07:33 am
It should be quite simple to do. If you select "Do not load existing files" in the Project Properties dialog and all PDFs are properly placed in the Download directory where Offline Explorer stores files, then when downloading the site, Offline Explorer will not load these existing files. Also, it will add these files to the Project`s Map, so it will be possible to export the whole site, etc.

I hope this helps.

Best regards,
Oleg Chernavin
MetaProducts corp.
Gerald 10/26/2003 08:52 am
Thanks, Oleg. I did try this, but it continued to download all the files anyway. I`ll give it a try again...maybe I had another setting that was interferring.

Gerald
Gerald 10/26/2003 08:55 am
Actually, I hope this works, now. Outlook Explorer Pro just locked up on me while attempting to export a project. I`ve lost ALL my project files! There were only two, but each took several hours to download and had quite a few filter options in them.

Gerald
Oleg Chernavin 10/27/2003 03:52 am
You can restore Projects easily from backup copies of the projects file (webdown.* files).

Oleg.
Gerald 10/27/2003 11:38 am
Oops...I`ve been calling your product "Outlook Explorer Pro". Sorry!!!

I tried the setting again and still unsuccessful at preventing PDF files from being redownloaded. Here`s some of my current settings. Maybe you can readily see the problem. Is there a way to forward my complete project settings to you for review?

NOTE: The primary site is located at http://sunsolve.sun.com/handbook_pub
NOTE: PDF files are located at http://sunsolve.sun.com/data/###/###-####/pdf/ and
http://sunsolve.sun.com/data/###/###-####/html/
NOTE: Undesired PDF files (duplicates) also located at
http://sunsolve.sun.com/data/###/###-####-##/pdf/

NOTE: ###/###-#### represents some document ID location (i.e. 805/805-1234/pdf/)

Project:
Addresses (URLs) = http://sunsolve.sun.com/handbook_pub
Level limit = unchecked
File modification check = Do not download existing files

File filters (default settings with the exception of Archives):
Text = Load using URL filter settings
Images = Load from any site
Video = Load using URL filter settings
Audio = Load from any site
Archives = unchecked
User defined = Load from any site
Other = Load using URL filter settings

URL Filters
Server = Load files only within the starting server
Directory = Custom directories configuration
Custom configuration keywords (include):
/handbook_pub/
/data/*/pdf/
/data/*/html/
Custom configuration keywords (exclude) -- Quite a few but none from /data/:
/Systems/*Netra*/
/Systems/SS*/
...and so on
Filenames = Custom filenames configuration
Custom configuration keywords (exclude)
*Netra*
SunFire.html
...and so on
Custom configuration keywords (include)=none configured


Funny thing with the options...with custom directories, I have to have /handbook_pub/ or it will not parse anything (excludes all directories except those in the include list). However, with custom filenames, it doesn`t appear to affect the download by requiring entries in the included list.

Thanks a million for your help!!! You`re product is the first that properly parses the JavaScript at this site! Other products I have tried that support JavaScript (like HtTrack) are having problems.

Gerry

Do you see anything that pops out?
Gerald 10/28/2003 05:21 am
I have some updated information on this...

It appears that both the "Download new and updated files" with the "Check file size" option works as well as the other option we have been discussing. However, if the file size is quite large (say 22 MB), the file gets redownloaded.

I`ll try to rerun the project tonight and see if the files are downloaded again, now that they are in the project, and retest by removing the file from the project and copying it back into the proper directory.

Thanks,
Gerald
Oleg Chernavin 10/28/2003 07:31 am
> Oops...I`ve been calling your product "Outlook Explorer Pro". Sorry!!!

No problem with that at all! I haven`t even noticed it! :-)

Well, in short, the problem is that you placed some of PDF files into proper directories, however Offline Explorer still loads them. Since you are using "Do not load existing files" option, it should skip the file simply if it is present.

What could be wrong is that Offline Explorer expects the PDF file to be with sightly different name or in another directory.

Can you please send me few examples of PDF URLs it redownloads and corresponding filenames (with paths) on your hard disk?

Oleg.