PARSING html files TO download LINKED files BUT NOT the parsed ones ?

Author Message
Basile PAILLET 03/29/2006 07:02 am
Hi Oleg,

I`d like to know if in OEP there`s a way to only PARSE selected files
(defined by keyword) so that I can DOWNLOAD files
to which they are LINKED (also defined by keyword)
WITHOUT downloading the parsed files ?

Example (for clarity purpose) with an imaginary 3 levels site :
1st level = ZOOS : zoo1.htm, zoo2.htm, zoo3.htm...
2nd level = CAGES : cage1ofzoo1.htm, cage2ofzoo1.htm, cage3ofzoo1.htm...
3rd level = ANIMALS : animal1ofcage1ofzoo1.htm, animal2ofcage1ofzoo1.htm...

I want to PARSE all the zoo*.htm and cage*.htm files WITHOUT DOWLOADING THEM
TO DOWNLOAD ONLY the animal*.htm files to which they are linked.

Regards,

Basile
Oleg Chernavin 03/29/2006 07:02 am
Dear Basile PAILLET,

I am sorry, but there is no such feature. Anyway, you can do the following - download all these files and then go to the Project Map, right-click there, select "Selection window" and enter the desired keyword to find and delete such files.

I hope this helps.

Best regards,
Oleg Chernavin
MetaProducts corp.
Basile PAILLET 03/29/2006 07:02 am
Thanks for your reply !

I will do as you say.

Would it be technically possible to include the aformentioned feature
in a future release of OEP = is it compatible with the way OEP works ?

Regards,

Basile
Oleg Chernavin 03/29/2006 07:02 am
It could be done as a kind of Auto Delete certain files feature. So, when the download completes, it would delete certain files according to the keywords or masks. Would it satisfy you?

Oleg.
Basile PAILLET 03/29/2006 07:02 am
Dear Oleg,

An Auto Delete feature would surely help but it should be done in Real Time
i.e. as soon as the unwanted file is no longer usefull.
(Maybe just after it has been parsed ?)

That said, the heart of the problem remains :
In OEP, as I understand it, a file has to be DOWNLOADED (i.e. written locally)
before it can be PARSED.

Am I right ? and How come ?

> SIDE NOTE : Real World Example
>
> Downloading all the "Q_*.html" files in
> http://www.experts-exchange.com/Developer/Programming/Programming_Languages/Delphi/
>
> My filters :
> Protocol = Starting
> Server = Starting
> Directory = Custom
> = Included = Programming/Programming_Languages/Delphi/
> = Excluded = qEmailFriend | memberNewFree | memberProfile | memberLoginForm
> Filename = Custom
> = Included = Q_ | viewQuestion | order
> = Excluded = memberProfile.jsp | memberNewFree.jsp | qEmailFriend.jsp | memberLoginForm.jsp | memberRank.jsp | register.jsp | login.jsp
>
> I end up with (for now...) :
> 6000+ WANTED Q_*.html files (60 MB+)
> 4000+ UNWANTED viewQuestionHistory.jsp?* files (around 40 MB)
> 4000+ UNWANTED ?[?]order=* files (around 40 MB)
>
> It takes approximately FOREVER even though
> I recently switched to a DSL connection !!!
>
> But I have to include/download the "viewQuestion" & "order" files
> for them to be parsed for the links to the Q_*.html files...

Hoping I was clear enough,

Thank you for your help,

Basile
Oleg Chernavin 03/29/2006 07:02 am
Basile,

Sorry for the long time before I answer. What if I add to Offline Explorer Pro another parameter to the URLs field, like:

http://www.someserver.com/page.htm
DeleteFiles=viewQuestionHistory.jsp?*;?[?]order=*

Would this satisfy you if OE Pro will erase such files immediately after downloading and parsing? This should be quite easy to add.

Oleg.
Basile PAILLET 03/29/2006 07:02 am
Hello Oleg,

> Sorry for the long time before I answer.
No problem, I`m pleasantly surprised !

>What if I add to Offline Explorer Pro another parameter to the URLs field, like:
> http://www.someserver.com/page.htm
> DeleteFiles=viewQuestionHistory.jsp?*;?[?]order=*

Yeah, it sounds easy to use AND flexible (wildcards) !

> Would this satisfy you if OE Pro will erase such files immediately after downloading and parsing?

Definitely, it`s what I`m looking for...

>This should be quite easy to add.

Great then !

Thank you very much,

Basile
Oleg Chernavin 03/29/2006 07:02 am
OK. I just added that feature. The exact syntax is the following:

http://www.someserver.com/page.htm
DeleteAfterParsing=viewQuestionHistory.jsp?*,?[?]order=*

Please use spaces or comma symbols to separate keywords. The updated oe.exe file is available here:

http://www.metaproducts.com/download/betas/oep1256.zip

Oleg.
Basile PAILLET 03/29/2006 07:02 am
GREAT !

Thank you Oleg !

Basile :))
Oleg Chernavin 03/29/2006 07:02 am
You are welcome!

Oleg.