problems with a weird protected forum

Author Message
TJ 08/06/2008 05:46 pm
Hi,

I have problems to spider the forum.
http://forum.islambase.co.uk/index.php?showforum=18

I have username and password which can send you by email. After logging in the forum, I can view the forum in the internal brower of OE 4.9. However, the spidered page is still the login page.

I have tried the "Alt+ctrl" function, the username:password@url method. The cookie option is selected and I also put username password information in the Password section. None of these worked.

Thanks for help/
Oleg Chernavin 08/07/2008 04:37 am
Usually you just have to logon the site in the internal browser and then start downloading it. Only please add the following line to the Project''s URLs field:

IgnoreLogoutLinks

Best regards,
Oleg Chernavin
MP Staff
TJ 08/07/2008 02:27 pm
Hi Oleg,

I have tried what you suggested but it still does not work. I logged into the forum in the internal browser and I can view the page from the internal browser. IgnoreLogoutLinks has been added. But the downloaded page was still the login page.

That forum is really weird. Can you try to spider it using my account? Which email address should I send my account info to? Thank you.

> Usually you just have to logon the site in the internal browser and then start downloading it. Only please add the following line to the Project''''s URLs field:
>
> IgnoreLogoutLinks
>
> Best regards,
> Oleg Chernavin
> MP Staff
Oleg Chernavin 08/07/2008 03:12 pm
Yes, please send me the details to support@metaproducts.com. I will try to make the download myself and see what could be wrong there.

Oleg.
TJ 08/18/2008 03:07 pm
Hi Oleg,

I received your email and used your settings:
http://forum.islambase.co.uk/index.php?showforum=18

Referer=http://forum.islambase.co.uk/index.php

IgnoreLogoutLinks

But it still does not work. The downloaded page "index.php@showforum=18" still shows log in information.

Are you using Pro 4.9? If yes, can you try again and make sure the page "index.php@showforum=18" and all the other pages in this forum are downloaded? Thanks.

> Yes, please send me the details to support@metaproducts.com. I will try to make the download myself and see what could be wrong there.
>
> Oleg.
Oleg Chernavin 08/18/2008 03:21 pm
For 4.9 version please open the Options dialog - Proxy section and check the NTLM box. Click OK button, logon the site in the Internal browser again and repeat the download.

Oleg.
TJ 08/18/2008 03:39 pm
That works, Thanks a lot!

What type of forums or websites should I choose the NTLM box? And in your settings, what is the function of "Referer" ?

> For 4.9 version please open the Options dialog - Proxy section and check the NTLM box. Click OK button, logon the site in the Internal browser again and repeat the download.
>
> Oleg.
Oleg Chernavin 08/18/2008 03:43 pm
Make NTLM by default. It is compatible with all Web sites I know. 5.0 version has it checked by default when it is installed.

Referer tells the site that you came from that page, not just typed its address manually.

Oleg.
TJ 08/18/2008 03:59 pm
Then why do I need to put referer? It''s just a inlink I think.

> Make NTLM by default. It is compatible with all Web sites I know. 5.0 version has it checked by default when it is installed.
>
> Referer tells the site that you came from that page, not just typed its address manually.
>
> Oleg.
TJ 08/18/2008 05:49 pm
Hi Oleg,

Sry to bother you again. After spidering the forum for a while, I met another problem. The pages downloaded show some errors that seem to related to cookies:

Bad Request
Your browser sent a request that this server could not understand.
Size of a request header field exceeds server limit.


Cookie: session_id=eeb45ec4b80ced7d322b953633a88b1c; modtids=%2C;***********(very long here)

--------------------------------------------------------------------------------

Apache/1.3.41 Server at serve110.globalmediahost.com Port 80

What should I do? I have tried to delete all the cookied and spidered again. It didn''t work.

> Then why do I need to put referer? It''''s just a inlink I think.
>
> > Make NTLM by default. It is compatible with all Web sites I know. 5.0 version has it checked by default when it is installed.
> >
> > Referer tells the site that you came from that page, not just typed its address manually.
> >
> > Oleg.
Oleg Chernavin 08/19/2008 04:31 am
So when it starts download, it gets pages well, but them errors start coming in?

Oleg.
TJ 08/19/2008 08:25 pm
Yes. The first 300 pages are downloaded well. Then I get the error.

> So when it starts download, it gets pages well, but them errors start coming in?
>
> Oleg.
Oleg Chernavin 08/20/2008 04:53 am
I think, the site simply has some limit. It is not designed for viewing more than 300 pages. Maybe it is possible to split the download task into parts?

Oleg.