Filtering out useless links.. Oleg?
|Steve Lim||10/20/2004 02:47 am|
|Here is an example of the rootpage..
This is a filtered search from a forum.
I wish to only check out the links in the main section.. from my testing.. it appears they all being with
http://220.127.116.11/showthread.php*... (* for wildcard)
I wish to only investigate these links at level 0.. then within these links look for references to movie files (.avi .mov) etc. I guess this would be @ level 1?
|10/20/2004 03:58 am|
|> I wish to only investigate these links at level 0
Level 0 would be the starting page.
>.. then within these links look for references to movie files (.avi .mov) etc.
> I guess this would be @ level 1?
There are only links to the movies at level 1.
(The movies are at level 2)
But you need at least level 3. (You have more than one searchresult page).
Load using URL Filters settings
Load from any site:
View *included* files keywords:
|Steve Lim||10/20/2004 05:55 am|
|Thanks for replying Anonymous...
Unfortunately.. it doesnt seem to work for me.. even more interesting is that the search page seems to time out or something.. after I try the search in the browser.. when I try it in OE, it says search does not exist.. it seems that that is the first hurdle to overcome.
|10/20/2004 06:15 am|
|> Thanks for replying Anonymous...
> Unfortunately.. it doesnt seem to work for me..
> even more interesting is that the search page seems to time out or something..
Yes. Of course they are not storing the search result for all time! But you should have enough time when you try to start a new search.
Then you have to adjust the Start-URL and the searchid in the filename URL Filters:
You can find the searchid in the link to the other searchresult pages. (Page 2 of 2)
("&t=*&highlight=reel" should be unchanged if you are searching for the same keyword)
If this still wouldn`t work, you could try to first downloadload only the text files (disable the Video - File Filter). Then enable the Video - File Filter and load the missing files.
|Oleg Chernavin||10/20/2004 06:16 am|
|I would also suggest to use the Internal Browser of Offline Explorer to make the search request - take the resulting URL from it.
|Steve Lim||10/20/2004 06:57 am|
|I think there is no simple way to do what I want.
My question is can I specify rules for certain levels of the search only??
For example.. from the main page (which I now found is http://18.104.22.168/search.php?s=f6c39605c0230880c8ee64f77afdaec0&searchid=546656) I wish to ONLY explore the links that start with -->
and to ignore ALL other links.. but once inside the showthread.php pages.. I wish to d/l MOVIE files from any source.
I hope this is clear.
|Oleg Chernavin||10/20/2004 07:18 am|
|You can simply specify the following keyword:
and also allow downloading from any site of the File Filters | Video
|Steve Lim||10/20/2004 08:25 am|
|> You can simply specify the following keyword:
> > showthread.php?*
> > and also allow downloading from any site of the File Filters | Video
> > Oleg.
Thanks Oleg. But what if showthread* is at level 0.. leading to a page that has a movie in the URL
If I just use the inclusive keyword.. will it miss out the actual final URL that I seek?
|Oleg Chernavin||10/20/2004 08:29 am|
|Yes, you will need to allow Level=1.
|Steve Lim||10/20/2004 08:41 am|
|> Yes, you will need to allow Level=1.
> > Oleg.
Well. Thanks Oleg. But I`m afraid this is doing what I feared.. perhaps I am not being clear enough (very likely) =)
Your solution DOES indeed limit the searches to urls with showthread* but like I said.. the final .mov that I am looking for could be on a URL @ www.yahoo.com/test.mov for example..
Your setting completely misses it.
Here is my last try @ explaining it. =)
this is the base page which I call level 0
from the basepage.. all the URLs that I want to spider into have the prefix http://22.214.171.124/showthread.php?*
all the other small links I wish to filter out.
after I get into the http://126.96.36.199/showthread.php?* pages... I wish to look out for ANY url with a .mov or .avi etc... it could be from an external domain/server.
eg. one of the showthread pages leads me to a page with http://www.yahoo.com/test.mov ... I want to download THIS movie. (final objective)
I hope this is clear.
Thanks for the time.
|Oleg Chernavin||10/20/2004 09:13 am|
|You need to allow downloading from any site of the File Filters | Video. This will make OE to obey the "showthread" keyword for all files except videos. The latter links will be loaded from any server.