Problem with crawling https site authenticated pages

Author Message
Supun 08/21/2009 06:42 am
I have started to use this software to crawl a web site which is secure (HTTPS). But the problem I''m currently facing is, It is not crawling the pages once the user has authenticated. (What I mean here is crawlar doesn''t capture pages which required authentication but captures pages which not required authentication like) eg: in a web site we can have help pages which we do not need user name and password

Please note I have filled user name and password fields as well. But still no luck. Please send me is there a way to get crawl all the pages in the site.


Oleg Chernavin 08/21/2009 07:54 am
There are two kinds of password-protected Web sites. One type asks for a username and password in a standard Windows-type dialog (BASIC and NTLM authentication,) while the other type requires you to logon directly on a Web page.

1. Web sites that require BASIC and NTLM-authentication.

To download this type of site, either specify the username and password in the Project Properties dialog | Advanced | Password section, or type them directly in the URL, such as:

Notice that a colon separates the password from the username and the @ symbol separates them from the server name.

Some sites use NTLM authentication, which looks like the above, but with a third box labeled "DOMAIN" along with the username and password. Enter the domain name along with the username in the same field in the Project Properties dialog | Passwords this way:


The backslash symbol separates the domain name from the username. When you are done with the Project Properties changes, click the OK button to save them.

Note: NTLM authentication is supported only in the Pro and Enterprise editions of Offline Explorer.

2. Authentication in a Web form.

It is easy to download this type of site.
You need to browse to the logon page of the site using the internal Browser of Offline Explorer Enterprise .

If you need to download the site immediately and only once, you can proceed with the logon and begin downloading the desired pages using Offline Explorer Enterprise . The program will use the session cookies of the logged on site from the internal browser.

You can also record the logon form contents in a Project, so that Offline Explorer Enterprise will know how to log itself on whenever you wish to download the site. This is useful when you want to schedule the site download or perform it later, or if you want to update the downloaded site in the future.

Once you have entered your username and password on the logon page in the internal browser, press and hold the Alt + Ctrl keys on your keyboard, click the Logon (or Submit) button in the Web form and release the keyboard buttons. You should get a new Project that contains the Web form information recorded in the URL field.

Adjust the Project settings as you wish (set the Level and other parameters) and click the OK button to save the Project. You may begin downloading at any time.

It is also recommended to make sure that the "Use alternate connection method" box is checked in the Internet tab of the Ribbon.

Note: The form recording method is supported only in the Pro and Enterprise editions of Offline Explorer.

Best regards,
Oleg Chernavin
MP Staff