URL Substitute - Export Recently Downloaded Files Only

Author Message
Tony 12/17/2006 12:01 pm
Hello,

I am using version 4.5.

My project downloads files from a website that is updated, so I do multiple downloads.

When the website updates, the links to pages change slightly to reflect their sequence number.

I have been downloading this website for a year or so without problem using a url substitute to delete the sequence number.

But a minor change to the sequence numbering in the url meant I had to amend the url subsitute - again, it simply deletes the sequence number and is effective in doing so.

BUT - OE exports the new file, even though my project settings are set to export recently downloaded files only. Unfortunately as I data mine each page for a database, I am now getting multiple entries of numerous files.

Note that the url substitute url box is unchecked, and 'apply all matching rules' is checked.

Auto Export is set to: Additional=AutoExport=c:\xxx\;0010000010

File Modification Check is set to 'skip exisitng files on levels higher than 1' (and has been for a year +). I found any other setting prior to the minor sequence number change resulted in multiple downloads too.

I updated a number of projects for the website using a template.

Possibly there is a bug somewhere in 4.5? Or do you see some problem in my settings?

Many thanks,

Tony.
Oleg Chernavin 12/17/2006 01:43 pm
Perhaps, the same files get downloaded again and again and this is why the export considers them as newly downloaded. I think that the server may not support file modification date checks. To ensure it downloads and saves really updated files, you may try to add the following line to the Project's URLs field:

Additional=CountCRC

Best regards,
Oleg Chernavin
MP Staff
Tony 12/19/2006 11:45 am
Hi Oleg,

I don't think they have changed server so I don't think that file modification date checks is the problem (has not been a problem until now).

Unfortunately Additional=CountCRC does not make a difference.

(Ummm - what does CRC mean by the way?)

Any other ideas?

Many thanks,

Tony.



Oleg Chernavin 12/19/2006 12:02 pm
CRC is a checksum of each file to make sure its contents is identical or not. I cannot suggest anything now without knowing more details.

Oleg.