I am using URL substitutes in my projects.
As the website I download from updates, I have to make subsequent downloads and need to use the "skip existing files on levels higher than" option.
I notice that when I download the second time, oe pro looks for the urls I have created using URL substitutes. Nothing is dowmloaded of course, but I note from the log that `404 not found` are instead downloaded (and it would seem discarded).
The problem is I am hitting the target server and causing all these 404 messages, using bandwidth and no doubt causing whoever is using the target server to wonder what is going on.
Is there a way to prevent this please?
This is a problem because I actually extract data and put it into a database. If I completely redownload the website it means the first download`s data will be doubled up, along with taking much more time.
Is there no way around this? Or are there plans to do something about this in a future version (hopefully due out soon?!)
I am not 100% sure that it will help, but this will keep original (not parsed) copies of downloaded HTML pages. They will be used while updating the site.
I am changing urls for jpg files. Unfortunately with additional=primary i just get the rewritten url with .primary on the end.
Nice if a better solution can be found though :-)