Downloading link(s) containing "?"

Author Message
Gerald 10/28/2003 05:18 am
By what method can I filter links containing "?" ? I would like to filter all links containing:

http://sunsolve.sun.com/pub-cgi/show.pl?target=content*
http://sunsolve.sun.com/pub-cgi/show.pl?target=home
http://sunsolve.sun.com/pub-cgi/secBulleting.pl*
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsalert*
http://sunsolve.sun.com/pub-cgi/search.pl?mode=results&origin=advanced&range=20&so=date&coll=fsalert&zone_32=category:security

I`m also trying to retrieve the following links, but only if listed on the pages above:

http://sunsolve.sun.com/pub-cgi/findPatch.pl?patchId=*

I wish to exclude all other pages in /pub-cgi/

What`s the best method to accomplish this?

Thanks a million,
Gerald W. Wise
Oleg Chernavin 10/29/2003 08:30 am
I would suggest you to use URL Filters | Filename | Custom Configuration and add the following keywords to the Include list:

show.pl?target=content*
show.pl?target=home
secBulleting.pl*
retrieve.pl?doc=fsalert*
search.pl?mode=results&origin=advanced&range=20&so=date&coll=fsalert&zone_32=category:security
findPatch.pl?patchId=*

All other links will not be loaded with that setup.

Best regards,
Oleg Chernavin
MP Staff
Gerald 11/01/2003 08:20 am
Ok, I`m starting to understand some of the login in OEP. It`s been hard to get a grasp on after using a few products that have their roots in UNIX and typically use regular expressions for making matches.

Correct me at any point if you feel that my observations are incorrect...

Using directory and file filter options, you can include or exclude items on a directory or file basis, but not both simultaneously. For instance, if there are pdf files all over the site, but usually reside in a directory "pdf", I can exclude pdf files on a directory or file basis, but not both. Given teh following files (not real world example):

http://www.somesite.com/newsreleases/pdf/20021030.pdf
http://www.somesite.com/newsreleases/pdf/20021130.pdf
http://www.somesite.com/newsreleases/pdf/20031030.pdf
http://www.somesite.com/newsreleases/pdf/20031130.pdf

http://www.somesite.com/notices/pdf/20021030.pdf
http://www.somesite.com/notices/pdf/20021130.pdf
http://www.somesite.com/notices/pdf/20031030.pdf
http://www.somesite.com/notices/pdf/20031130.pdf

Let`s say that I am interested in all the news releases, but only the notices for the current month. I have been unsuccessful at generating a filter that will obtain all the files in the newsreleases/pdf/ directory, but only obtain files in the notices directory that start with 200311*.pdf. I can`t seem to successfully add a directory path to a file filter or a file to a directory filter. Am I missing something.

The only way that I have been able to successfully accomplish this is to break the project into two projects. One that has a directory filter exclude that excludes notices/pdf/ and another that ONLY has http://www.somesite.com/notices/pdf/ as the URL and 200311 as the file filter.

Comments?

Thanks,
Gerald

Gerald 11/01/2003 02:11 pm
> Ok, I`m starting to understand some of the login in OEP.

Oh, how I hate typos....this was supposed to read, "Ok, I`m starting to understand some of the <b>logic</b> in OEP."
Oleg Chernavin 11/02/2003 01:03 pm
This should be not hard. Please allow loading from all directories in URL Filters | Directory.

Now go to URL Filters | Filename and add the following keywords:

http://www.somesite.com/newsreleases*/*.pdf
http://www.somesite.com/notices/pdf/{:longyear}{:0month}*.pdf
http://www.somesite.com/*

The last line will allow loading all files from the root directory of the site - add few more other directories if HTML files with links to PDFs are there.

The first line allows all PDFs from newsreleases directory and its subdirectories. The second line allows only 200311*.pdf files from /notices/pdf/. {:longyear} will be replaced with 2003 for the (current year) and {:0month} - with the current month number with leading zero (if necessary). These things are called URL Macros in Offline Explorer.

I hope this helps.

Oleg.
Gerald 11/02/2003 06:52 pm
So, the http://www.somesite.com/* won`t overide the other filters and get EVERYTHING from ALL directories?
Oleg Chernavin 11/03/2003 03:20 am
> So, the http://www.somesite.com/* won`t overide the other filters and get EVERYTHING from ALL directories?

Not from all, but from the root directory only. http://www.somesite.com/*/* will really overwrite other filters and allow all files from all directories.

Oleg.