Skip URL`s vs. Server/Directory/Filename Filters

Author Message
Brad Konia 12/01/2004 09:22 am
Can you please explain the difference (if any) between entering a full or partial URL in the "Skip URL`s box" and using the Server/Directory/Filename filters to enter an equivalent URL?

For example, suppose I wanted to prevent OE from spidering any page on www.somedomain.com. As I understand it, I could accomplish this by entering:

http://www.somedomain.com

in the Skip URL`s box

However, couldn`t I also accomplish the same thing by creating a Server filter with the keyword "www.somedomain.com"?

Is there any difference between these two methods?
Oleg Chernavin 12/01/2004 10:38 am
The Skip URLs box should contain:

http://www.somedomain.com/

- please pay attention to the ending slash.

Excluded server keywords should contain:
www.somedomain.com

and it will do exactly the same - stop downloading anything on the specified server. The difference is that the Skip URLs accepts only full URLs to be skipped, while other URL Filters sections can use keywords (parts of URLs with wildcards, etc.)

The reason to add Skip URLs feature was to make processing long lists of exclusions really fast while downloading files. Keywords take much more time to process.

Best regards,
Oleg Chernavin
MP Staff
Brad Konia 12/01/2004 12:55 pm
> and it will do exactly the same - stop downloading anything on the specified server. The difference is that the Skip URLs accepts only full URLs to be skipped, while other URL Filters sections can use keywords (parts of URLs with wildcards, etc.)

OK, just to clarify:

When you say that it will "stop downloading" does this mean that it will stop spidering the site entirely, or will it continue to spider the site as per the Level setting, but not save the excluded pages?

For example:

* Suppose I enter http://www.somedomain.com/index.htm in the Skip URL`s box and suppose that page contains a link to http://www.anotherdomain.com.

* Now, further suppose that OE comes across a page that contains a link to http://www.somedomain.com/index.htm

I realize that it won`t download the somedomain.com page. However, will it still parse that page and follow the link to anotherdomain.com, or will it ignore the somedomain.com page entirely and not spider any of its links?
Oleg Chernavin 12/02/2004 06:59 am
It means that it will not follow (download) the links that are not allowed for the download (excluded).

Oleg.