Javascript and OLE

Author Message
Joane 10/07/2010 08:14 am
Hey, Oleg. I've got a question about "Add the next clicked link as a project" button and OLE automation.

Here is the outline of the problem:
The URL of the page to be downloaded is calculated from a simple javascript. But the script itself is added to the page asynchronously (Ajax).

The question is how to get the URL from a script that is not even on a page and, most importantly, do it without using the button "Add the next clicked link as a project" (i.e. do it with OLE).

I think that the problem could be easily solved if the functionality of the "Add the next clicked link as a project" button were
available to OLE automation.

So, after loading the page into the internal browser and knowing how the script that is added with Ajax looks like, OE gets internal browser to calculate the given script and the URL and cookies are saved to the url field of the project. By the way, being able to get cookies from the page that is loaded into the internal browser using OLE can also be quite a valuable feature.

I wonder if that is possible.

Thank you in advance. Joane.
Oleg Chernavin 10/08/2010 11:50 am
Sorry for the delay!

What you ask is really not easy thing. Because "Add the next..." requires user manual action - he has to click manually on a certain link in the browser. This is virtually impossible to automate. Maybe you could show me what exact page and link you are trying to download? Perhaps, I could find some other way.

Or there are many links?

Best regards,
Oleg Chernavin
MP Staff
Joane 10/11/2010 01:45 pm
Thank you for your reply, Oleg!

I think I should have formulated the problem more generally. How to get a link from the javascipt which OE is not able to calculate after OE downloads the page with that script?

Obviously, it doesn't take too much effort for a developer to make his site immune to OE. All he has to do is to use javascript instead of simple "a href" tags, add some frames/iframes here a there, and put as many scripts as possible in external .js files. That way the available methods, like checking "Evaluate script calculations" and including the line "additional=ParseIncludedScripts", will give absolutely no result.

So I see the only solution to such a problem.
1. Load the page into the internal browser.
2. Get the internal browser to execute the script (obviously, it is necessary to know what the script is).
3. Get OE to intercept the URL which is about to load into the internal browser.
4. Add that URL (i.e. GET or POST request) and cookies into the url field of a given project.
5. The only way to load hundreds of pages into the internal browser one at a time is to use OLE automation. Besides, the URL of each next page might depend on what was just downloaded.

You've said that this might not be so simple. So I'd be glad to assist you as much as I can (you've got my e-mail).

And here's my first idea.
If executing a javascript code in internal browser seemed to you the most challenging problem, then (if I'm right and OEE is written in Delphi, and the internal browser is TWebBrowser) I've found how to execute a javascript code on a TWebBrowser document in Delphi. You can see it here:

http://delphi.about.com/od/adptips2006/qt/wb_execscript.htm.

I hope I was thinking in the right direction.

Thank you. Joane.

P.S. I've noticed that if I try to start a project in my Delphi application with OLE, and that project has "Browsebeforewithdelay" line, no page is loaded into the internal browser.
Oleg Chernavin 10/11/2010 01:48 pm
Yes, the problem is exactly with the knowledge of what exact script should be executed. Offline Explorer can already execute scripts with 2 different ways. But knowing that exact menu at exact pixel coordinates should trigger certain script with certain parameters - this is what is very hard.

The Internal browser toolbar has the AutoSave button that turns on mode of semi-automatic load of missing links - you select a Project, click that button and browse the site offline. When you click some link or script executes that follows a link that is not yet downloaded, Offline Explorer loads that link and adds it to the Project.

Oleg.
Joane 10/12/2010 12:20 pm
I absolutely agree with you. That is why my main point was that the script itself is known. It is, for example, onclick=getnextUrl('') and it is the same for all the pages (and like I said OEE is not able to parse it).
Oleg Chernavin 10/12/2010 12:37 pm
Can you show me an example of such page with that script?

Oleg.
Joane 10/12/2010 12:38 pm
You have mentioned autosave button. Well, it requires manual input, and that is just impossible with hundreds or thousands of starting pages.

And, by the way, I've never been able to paste any URL which was longer then approximately 130 characters in internal browser's combobox. And this is roughly one third of all URLs that I usually try to browse. The similar thing happens with the addresses in the url field in the project. I think the limit is about 1000-2000 characters. In this case the address can be pasted, but no file is downloaded after I start the project (the same address works just fine in IE).

Joane.
Oleg Chernavin 10/12/2010 12:50 pm
Please give me examples to test. Also, the long URL examples would help. Thank you!

Oleg.
Joane 10/12/2010 02:59 pm
Let's take, for instance, __doPostBack('lkbAthensLogin','') script on the page
http://www.netlibrary.com
("Athens Users, Log in here"). OE can't get the link to athens site.

I know you've been trying to improve OE's ability to parse __doPostBack scripts. But that is absolutely not the point. The point is how to execute some modified script on that page, lets say __doPostBack('AthensLogin2',''), or even the script that does not exist on that page but can be added to it by Ajax.

I'll send the link that I cannot paste in the URL field of the internal browser to your e-mail in a couple of minutes.



Oleg Chernavin 10/14/2010 02:26 pm
Actually, this link is supported. I just loaded it with the latest version and Offline Explorer converted __doPostBack('lkbAthensLogin','') script to a static link:

.../www.netlibrary.com/Gateway.aspx@__EVENTTARGET=lkbAthensLogin&__EVENTARGUMENT=

The _doPostBack links should be fully supported. When ASPX pages make a new version with another kind of such link, I update my code and support them as well.

Oleg.
Joane 10/15/2010 05:03 pm
Yes, you are right. The latest version of OEE parses dopostback script correctly.

Meanwhile, I've come up with my own solution to this problem. I just put twebbrowser on the form in my delphi application and coded the execution of the javascript with any parameters I want.
It is still a little cumbersome getting the cookies and the URL after the script is executed and copying them to the project in OEE. But it works.

Thank you for paying attention to this problem and always trying to make OE better!

Joane.
Oleg Chernavin 10/15/2010 05:53 pm
Thank you! Yes, I agree that your method works. There are only two issues:

1. To know which exact script to execute and with what exact parameters.
2. Make this kind of parsing fast, because loading a page in TWebBrowser takes a lot of time usually.

This is why I still haven't implemented this way.

Oleg.