The problem I am running into is that one of my sites contains over 2 GB of PDF documents. These have been previously downloaded using another software package. I would like to move these files into the new OEP project and process them as part of the OEP project, only downloading new or updated files.
I could move the files (stored in the site`s structure) into the OEP directory, exclude this directory from the project, and translate all links to relative locations (i.e. old_software_path/www.website.com/pdffiles/*.pdf -> new_OEP_location/www.website.com/pdffiles/*.pdf). This has worked. However, including this online location causes OEP to redownload every file since it isn`t pre-existing in the Map.
Is there an option to check local file size with that stored at the site and add NOT redownload files that are located locally, but not currently in the Map? I can see a feature such as this useful in many other circumstances, as well. What do you think?
Thanks a million,
Gerald
I hope this helps.
Best regards,
Oleg Chernavin
MetaProducts corp.
Gerald
Gerald
Oleg.
I tried the setting again and still unsuccessful at preventing PDF files from being redownloaded. Here`s some of my current settings. Maybe you can readily see the problem. Is there a way to forward my complete project settings to you for review?
NOTE: The primary site is located at http://sunsolve.sun.com/handbook_pub
NOTE: PDF files are located at http://sunsolve.sun.com/data/###/###-####/pdf/ and
http://sunsolve.sun.com/data/###/###-####/html/
NOTE: Undesired PDF files (duplicates) also located at
http://sunsolve.sun.com/data/###/###-####-##/pdf/
NOTE: ###/###-#### represents some document ID location (i.e. 805/805-1234/pdf/)
Project:
Addresses (URLs) = http://sunsolve.sun.com/handbook_pub
Level limit = unchecked
File modification check = Do not download existing files
File filters (default settings with the exception of Archives):
Text = Load using URL filter settings
Images = Load from any site
Video = Load using URL filter settings
Audio = Load from any site
Archives = unchecked
User defined = Load from any site
Other = Load using URL filter settings
URL Filters
Server = Load files only within the starting server
Directory = Custom directories configuration
Custom configuration keywords (include):
/handbook_pub/
/data/*/pdf/
/data/*/html/
Custom configuration keywords (exclude) -- Quite a few but none from /data/:
/Systems/*Netra*/
/Systems/SS*/
...and so on
Filenames = Custom filenames configuration
Custom configuration keywords (exclude)
*Netra*
SunFire.html
...and so on
Custom configuration keywords (include)=none configured
Funny thing with the options...with custom directories, I have to have /handbook_pub/ or it will not parse anything (excludes all directories except those in the include list). However, with custom filenames, it doesn`t appear to affect the download by requiring entries in the included list.
Thanks a million for your help!!! You`re product is the first that properly parses the JavaScript at this site! Other products I have tried that support JavaScript (like HtTrack) are having problems.
Gerry
Do you see anything that pops out?
It appears that both the "Download new and updated files" with the "Check file size" option works as well as the other option we have been discussing. However, if the file size is quite large (say 22 MB), the file gets redownloaded.
I`ll try to rerun the project tonight and see if the files are downloaded again, now that they are in the project, and retest by removing the file from the project and copying it back into the proper directory.
Thanks,
Gerald
No problem with that at all! I haven`t even noticed it! :-)
Well, in short, the problem is that you placed some of PDF files into proper directories, however Offline Explorer still loads them. Since you are using "Do not load existing files" option, it should skip the file simply if it is present.
What could be wrong is that Offline Explorer expects the PDF file to be with sightly different name or in another directory.
Can you please send me few examples of PDF URLs it redownloads and corresponding filenames (with paths) on your hard disk?
Oleg.