Problem with crawling https site authenticated pages

User Forums
Offline Explorer Enterprise Edition
Problem with crawling https site authenticated pages

Author

Message

Supun

08/21/2009 06:42 am

I have started to use this software to crawl a web site which is secure (HTTPS). But the problem I''m currently facing is, It is not crawling the pages once the user has authenticated. (What I mean here is crawlar doesn''t capture pages which required authentication but captures pages which not required authentication like) eg: in a web site we can have help pages which we do not need user name and password

Please note I have filled user name and password fields as well. But still no luck. Please send me is there a way to get crawl all the pages in the site.

Thanks

Supun

Oleg Chernavin

08/21/2009 07:54 am

There are two kinds of password-protected Web sites. One type asks for a username and password in a standard Windows-type dialog (BASIC and NTLM authentication,) while the other type requires you to logon directly on a Web page.

1. Web sites that require BASIC and NTLM-authentication.

To download this type of site, either specify the username and password in the Project Properties dialog | Advanced | Password section, or type them directly in the URL, such as:
http://username:password@www.server.com/...

Notice that a colon separates the password from the username and the @ symbol separates them from the server name.

Some sites use NTLM authentication, which looks like the above, but with a third box labeled "DOMAIN" along with the username and password. Enter the domain name along with the username in the same field in the Project Properties dialog | Passwords this way:

DOMAIN\username

The backslash symbol separates the domain name from the username. When you are done with the Project Properties changes, click the OK button to save them.

Note: NTLM authentication is supported only in the Pro and Enterprise editions of Offline Explorer.

2. Authentication in a Web form.

It is easy to download this type of site.
You need to browse to the logon page of the site using the internal Browser of Offline Explorer Enterprise .

If you need to download the site immediately and only once, you can proceed with the logon and begin downloading the desired pages using Offline Explorer Enterprise . The program will use the session cookies of the logged on site from the internal browser.

You can also record the logon form contents in a Project, so that Offline Explorer Enterprise will know how to log itself on whenever you wish to download the site. This is useful when you want to schedule the site download or perform it later, or if you want to update the downloaded site in the future.

Once you have entered your username and password on the logon page in the internal browser, press and hold the Alt + Ctrl keys on your keyboard, click the Logon (or Submit) button in the Web form and release the keyboard buttons. You should get a new Project that contains the Web form information recorded in the URL field.

Adjust the Project settings as you wish (set the Level and other parameters) and click the OK button to save the Project. You may begin downloading at any time.

It is also recommended to make sure that the "Use alternate connection method" box is checked in the Internet tab of the Ribbon.

Note: The form recording method is supported only in the Pro and Enterprise editions of Offline Explorer.

Best regards,
Oleg Chernavin
MP Staff

Problem with crawling https site authenticated pages

MetaProducts Systems Privacy Practices

Personal Information

Web Tracking Information

Information Security and Quality

Business Relationship

Cookies

Requests for Information and Legal Requirements

MetaProducts Systems Web Site Copyright

MetaProducts Systems End User License Agreement

TRADEMARKS

IMPORTANT: PLEASE READ THIS AGREEMENT CAREFULLY BEFORE USING THE SOFTWARE.

END USER LICENSE AGREEMENT

LICENSE OF UNREGISTERED SOFTWARE

LICENSE OF REGISTERED SOFTWARE

DISTRIBUTION OF UNREGISTERED SOFTWARE

TERM OF LICENSE

ACCEPTANCE OF THIS LICENSE AGREEMENT

LIMITATIONS OF USE

DISCLAIMER OF WARRANTY AND LIABILITY

OTHER RESTRICTIONS

INVALID PROVISIONS

ENTIRE AGREEMENT

GOVERNING LAW

MetaProducts Systems Terms of Use

TERMS OF USE

COPYRIGHT

MetaProducts Systems Trademarks