This is because the html files are generated on the fly using the Vignette content management system. Annoyingly, the page content is actually the same apart Vignette adding the following string:
<!-- Vignette V6 Tue Jan 25 18:56:47 2005 -->
which indicates the date/time the file was generated. It also includes a similar string to the beginning of each file, although this is the date the content was last modified (ie. this only changes when the content changes).
Because the image files are not generated on the fly, they always stay the same and so are not downloaded again unless they are modified.
There`s the explanation of what`s going on.
Because the files are actually the same size though, I would have thought that enabling "Check file size" should stop them from being redownloaded, but it doesn`t for some reason. Without checking, I would guess that OE cannot determine the file size and so it has to download the file anyway.
> Best regards,
> Oleg Chernavin
> MP Staff
Is there a IIS server option that needs to be set to send the size?
It would be cool if there was a "Stop downloading page if it contains above keywords" Content filters feature that worked "on-the-fly", allowing the download of a page to be stopped as soon as any of the keywords are found. For this feature to be really useful though, macros would need to be supported.
With this functionality, the problem experienced by Jack (and other ASP/PHP site problems of this nature) could be reduced to a minimum, because the whole file would not have to be downloaded.
It might be worth benchmarking the download with and without this feature enabled to see if the speed benefits are worthwhile (ie. does the parsing on-the-fly/aborting take longer than the actual download) of the www.oecd.org site, where each html file is anywhere between a few K to a couple of hundred K.
"Do not save any pages that contain keywords" and "Stop downloading pages when keywords found" options. Currently the whole file has to be downloaded anyway, before the file is parsed and a decision made as whether to save the file or discard it. If there are a lot of large files this could take a long time, even though only a few might end up being saved. With filtering on-the-fly, the download (of the page or project) will stop as soon as it finds one of the keywords, which in the extreme case could possibly be right at the start of the file.
I have a few more ideas related to how the power of Content filters could be improved, although they would require a reworking of the Content filters prefs page, and inner workings. For example, currently all keywords relate to all options. It would be much more powerful to have a list of content filters. When creating a new filter, you would select which option(s) to use, create a list of keywords this filter will act on. The filters could be moved up/down the list (filters are applied in top down order) so that some filters could be given priority over other filters. An example of this method in action would be firewall rules (eg. LooknStop firewall).
You could also have another option to only a particular filter if the file doesn`t already exist.
PS. It may just be me having a blonde moment, but aren`t the "Save all pages that do not contain the above keywords" and "Do not save any pages that contain the above keywords" options the same ??
>> same ??
If they are different, then the first option should read "Save all pages that contain the above keywords" (ie. remove the "do not")
Get back to the lastmodified, when I look at the OE log I see this Get for a css file:
Host www.oecd.org connected. Waiting for http://www.oecd.org/dataoecd/style/oecd_cda_0.css.
GET /dataoecd/style/oecd_cda_0.css HTTP/1.0
If-Modified-Since: Tue, 23 Nov 2004 08:30:45 GMT
Accept: *.*, */*
However for an html page I only see:
Host www.oecd.org connected. Waiting for Http://www.oecd.org/department/0,2688,en_2649_34487_1_1_1_1_1,00.html.
GET /department/0,2688,en_2649_34487_1_1_1_1_1,00.html HTTP/1.0
Accept: *.*, */*
How come I don`t see the If-Modified-Since
HTTP/1.1 200 OK
Last-Modified: Tue, 23 Nov 2004 08:32:28 GMT
The simular line is not returned by the server for HTML. This is why Offline Explorer cannot check if the file was changed or not. You may say that it may be worth to add the If-Modified-Since: <last download date> line in any case, but it is for sure that this line will not make a change. If the server doesn`t return file modification date, it will not even look at the If-Modified-Since line for that file.
"Do not save any pages that contain the above keywords" can be used with the above filter - in this case when a keyword is found, the page will be not saved, only pages without keywords are saved.
I agree with you that filters may be more flexible and even grouped. We plan to work on this, but so far there were no real requests that it is necessary.