Key Words filter?

Author Message
Michael 03/29/2006 07:01 am
I`m wondering why this hasn`t come up yet...
can we OE users define special "keywords" (for example: games, playstation, xbox) and get OE to grab us only RELEVENT pages from websites, that has those keywords (at least one) in the body content?

I just think it`s an important feature...

this way I could put www.cnn.com and get xbox related news daily! :) YAHHHOO!

Think about it!
Oleg Chernavin 03/29/2006 07:01 am
There is such code available. However the biggest problem is that CNN.com and simular sites are very big, so checking for updates and news would take hours on most connections or even days.

What do you think?

Best regards,
Oleg Chernavin
MetaProducts corp.
Michael 03/29/2006 07:01 am
I`m not talking necessarly about CNN or big sites. even small sites may benefit ALOT from this feature. It must, however, download only relevent files and know which files are irrelevent - so on the next run of the site, OE E won`t try to re-download and re-parse files that was already "marked" as irrelevent..

About the connection speed or disk space - I don`t think it`s a problem as this feature suppose to FILTER for us pages we don`t need - thus, focusing the harvesting to needed and relevent pages alone.

Let me know if you want that new code tested.

Michael.

> There is such code available. However the biggest problem is that CNN.com and simular sites are very big, so checking for updates and news would take hours on most connections or even days.
>
> What do you think?
>
> Best regards,
> Oleg Chernavin
> MetaProducts corp.
Michael 03/29/2006 07:01 am
Any new about that?

Thanks

> I`m not talking necessarly about CNN or big sites. even small sites may benefit ALOT from this feature. It must, however, download only relevent files and know which files are irrelevent - so on the next run of the site, OE E won`t try to re-download and re-parse files that was already "marked" as irrelevent..
>
> About the connection speed or disk space - I don`t think it`s a problem as this feature suppose to FILTER for us pages we don`t need - thus, focusing the harvesting to needed and relevent pages alone.
>
> Let me know if you want that new code tested.
>
> Michael.
>
> > There is such code available. However the biggest problem is that CNN.com and simular sites are very big, so checking for updates and news would take hours on most connections or even days.
> >
> > What do you think?
> >
> > Best regards,
> > Oleg Chernavin
> > MetaProducts corp.
Oleg Chernavin 03/29/2006 07:01 am
Sorry for the delay, it will take some time to prepare it.

Oleg.
Robert 03/29/2006 07:01 am
> Sorry for the delay, it will take some time to prepare it.
>
> Oleg.

Hello

That feature sopunds very interessting :)

Bye
Oleg Chernavin 03/29/2006 07:01 am
Yes. And it is already implemented started from 3.0 version. It is available in Pro and Enterprise editions of Offline Explorer.

Oleg.