How do I specify files downloaded, *without* affecting files searched for links?

Author Message
Amanda 06/10/2009 08:14 am

I have always been confused about one aspect of configuring a project in Offline Explorer...

And that is, how do you specify name or location limits on the files that are **downloaded** and NOT files that are loaded (i.e. the html files that OE needs to load in order to look for links)?


For example, if I try to specify the parameters of the files I want downloaded in the "URL Filters / Directory" or "URL Filters / Filename" sections, and then set the appropriate File Filter''s location to "Load using URL Filter settings", then nothing gets downloaded (or maybe a couple of files), and then the project stops.

But if I leave the URL Filters alone, and have the File Filter''s location set to "Load only from the starting domain", then it works, it just downloads every directory under the sun.


So, how can I specify parameters on what files I want it to actually download, without getting in the way of the files it looks through for links?
Oleg Chernavin 06/10/2009 10:07 am
Can you give me some real example, so I can understand this properly? Some site and which links you want to get and which - do not want. Thank you!

Best regards,
Oleg Chernavin
MP Staff
Amanda 06/10/2009 10:45 am
> Can you give me some real example, so I can understand this properly? Some site and which links you want to get and which - do not want. Thank you!
>
> Best regards,
> Oleg Chernavin
> MP Staff


Okay, we''ll start with a simple one:

-------------------------------------

Starting URL:
http://www.xkcd.com/archive/


JPG, JPEG files only located here:
http://imgs.xkcd.com/comics/

-------------------------------------


Now, in addition to the above site, many sites have thumbnail versions of images as well, which I would want to ignore. So, for example, I might want all files located in "http://imgs.xkcd.com/comics/", except those that end in "_t.jpg"
Oleg Chernavin 06/11/2009 05:48 am
I think you should allow downloading from all servers and directories in URL Filters and use URL Filters - Filename section. Add the following keywords to the Included list:

http://www.xkcd.com/archive/*
http://imgs.xkcd.com/comics/*

Excluded list:

_t.jpg

Oleg.
Amanda 06/11/2009 10:53 am
> I think you should allow downloading from all servers and directories in URL Filters and use URL Filters - Filename section. Add the following keywords to the Included list:
>
> http://www.xkcd.com/archive/*
> http://imgs.xkcd.com/comics/*
>
> Excluded list:
>
> _t.jpg
>
> Oleg.


That doesn''t work. First of all, if I set the Images File Filters to "Load using URL Filters", it downloads only 1 file and then stops. If I set it to "Load only from the starting domain", it''s going to ignore the entire URL Settings section, according to the help file.

And in any case, from the starting URL of "http://www.xkcd.com/archive", all the sublinks go to pages like "http://www.xkcd.com/589/". I would think the File filters you suggest above would prevent pages like "http://www.xkcd.com/589/Index.html" from loading.


As I said, it''s extremely confusing that there is no clear way to differentiate between the pages you want OE to load, and the files you actually want downloaded to keep.
Oleg Chernavin 06/11/2009 11:00 am
I see now. I thought you need only the /archive/ folder. Please change Filename - Included list to:

http://www.xkcd.com/*/*
http://imgs.xkcd.com/comics/*

File Filters - Images - "Load using URL Filters settings" in the Location box.

If you want to keep only the images (load Web pages and do not save) then uncheck the File Filters - Text category in the left-side tree.

Oleg.
Amanda 06/11/2009 04:36 pm
>
> If you want to keep only the images (load Web pages and do not save) then uncheck the File Filters - Text category in the left-side tree.
>
> Oleg.


That helps. It''s almost perfect - the download folder only contains one extraneous tree - "www.xkcd.com", which contains 596 empty subdirectories. Is there any remedy for that, or is that as efficient as we can make it?
Oleg Chernavin 06/11/2009 04:50 pm
Strange. Maybe they were left from a previous download? I created a similar project and www.xkcd.com directory is not created at all - I unchecked the File Filters - Text category.

If you would delete that directory tree and run the download again - would they be created again? If yes, can you send me your Project settings? Select it, use Export - Project Settings - Copy on toolbar and paste to the forum.

Oleg.