Filtering in Queue system
|sartori||07/19/2004 10:34 pm|
This is more a feature request than anything...
When downloading a large project, say when you might have 100k URLs in the queue it is very difficult to make any adjustments to the project without having to first stop the download, edit the project settings, then re-start the download. This seems like a wasteful process in terms of network bandwidth.
It would be far more useful if one could pause the current downloaded then have the option to;
1. Run an active filter over the current queue. By this I mean, remove or alter URLs from the queue by using Regular expressions.
2. Adjust the settings of the project, including URL Filters, download limits, etc and have these modified settings take effect with any new URLs being read into the queue after you resume or better still, take effect onto the current queue list (same as 1. but from a different place). If you get my meaning?
So for instance I`m downloading site xxx.com and have downloaded 150k files already and have 100k in the queue at the moment. I realise looking at the queue that I don`t need to download URLs like xxx.com/file.asp?printversion=true. So I pause the download. Then run my active filter on the current queue by removing all URIs that match `?printversion=true` and then resume my download. Or, I edit the project settings and add the pattern to the URL Filter list and select an option that effects the change to the current queue. This saves me having to rerun the project from scratch and reread 150k files from the web - saving bandwidth - saving time.
I can then easily remove the files on the local drive using a file search or better yet the Active Filter feature could have an option to remove local files based on that filter.
Anyway, is this something that others might like or find useful. Or am I just a freak.
Cheers, and thanks for an awesome program!
|Oleg Chernavin||07/20/2004 02:25 am|
|Thank you for the suggestion. It is already implemented in Offline Explorer Pro. You can right-click on any URL in the Queue and then click the Select by Mask. Type the reg. exp. you want and click OK button. Offline Explorer Pro will select all URLs in the Queue that match the expression. Then right-click any of the selected links (to preserve the selection) and choose Abort.
You can use the above also to keep the desired URLs and remove all others - just use Invert Selection feature before aborting them.
I hope this helps.
|sartori||07/20/2004 03:13 am|
I should have tried the latest version before making that post.
But. What about the second suggestion? Adjusting the project settings and have them reflect on the current queue?
> Thank you for the suggestion. It is already implemented in Offline Explorer Pro. You can right-click on any URL in the Queue and then click the Select by Mask. Type the reg. exp. you want and click OK button. Offline Explorer Pro will select all URLs in the Queue that match the expression. Then right-click any of the selected links (to preserve the selection) and choose Abort.
> > You can use the above also to keep the desired URLs and remove all others - just use Invert Selection feature before aborting them.
> > I hope this helps.
> > Best regards,
> Oleg Chernavin
> MP Staff
|Oleg Chernavin||07/20/2004 03:38 am|
|This was supported for a long time already. Since 3.0 version.
I haven`t done this yet, because it will work very slowly when you have many files in the Queue. For example, if you have opened the Properties dialog and then clicked OK button even without making serious changes, Offline Explorer will have to go through all files and verify them against the Project settings.
|Erik||05/18/2005 04:37 am|
|I have a similar issue with the download queue. I`m downloading a web site that has a specific string of characters in the URL to prevent it from being valid very long. Every 15 minutes or so, the server stops recognizing the previous string and will only accept a new one. Since I`m unable to come even close to completing the project in that time, I need to update the queue. So far, the only way I`ve found to do this is to perform a global search&replace in the page with all the links, remove URL substitutions, update the project, add the URL substitutions back, and then download missing files. The problem is that OE then needs to check to see which of the ~50,000 files need to be downloaded. This takes a substantial portion of that 15 minutes, and results in only about 200-400 files per cycle.
Being able to select individual files in the queue would not help in this case, because every single one needs to be changed. Since they all need to be changed in the exact same way, it seems as though a search&replace command would not require a substantial amount of time to perform.
Here is a page as an example:
You`ll notice that the address of the picture itself has a 32 character string after the "imagesession=". That string changes on a regular basis and is easy enough to check, but difficult to change in OE`s queue.
I`m currently working with a 6MB html file I generated that is essentially a list of links to each picture. That is the file I`ve been updating and then telling OE to go update. I had decided that following links to the pictures wouldn`t be feasible because of the enormous number of files it would generate. Additionally, OE doesn`t seem to be able to follow the method this site uses to link to the picture pages. A way to recognize and follow these links would be very useful as well:
|Oleg Chernavin||05/21/2005 01:55 pm|
|Well, an easy to use solution would be to suspend the queue to a file (.wdq) and do the replace there (it is a text file), then resume from the file. It should be quick enough.