Wayback Machine webcapture

Author Message
bryan 10/29/2008 03:26 pm
Hello,

I am evaluating the trial version of offline explorer pro. I was wondering if I would be able to capture a website located on the Wayback Machine. I only want to capture a specific website. Whenever I try to do this the most I can get is the home page of the URL without any images. When I go to click on a tab on the website, I usually get a "not in archive" message.


Thank you.
Oleg Chernavin 10/29/2008 03:36 pm
Can you give me a URL of this site? I will see what can be done.

Best regards,
Oleg Chernavin
MP Staff
bryan 10/29/2008 03:42 pm
> Can you give me a URL of this site? I will see what can be done.
>
> Best regards,
> Oleg Chernavin
> MP Staff

http://web.archive.org/web/20071110043408/http://www.ancoratech.com/

Thank You.
Oleg Chernavin 10/31/2008 01:14 pm
Thank you! I improved Offline Explorer to support this site. Can you please update your oe.exe file with this one?

http://www.metaproducts.com/download/betas/OEP2860.ZIP

Redownload your project and let me know how it works.

Oleg.
bryan 10/31/2008 03:50 pm
Thank you!

It seems the website is down. Once it comes back online I will test your patch out and let you know how it works.

Thanks alot for working on this.
Oleg Chernavin 10/31/2008 06:27 pm
It looks like the site works, but slow and not stable.

Oleg.
Bryan 11/03/2008 09:35 am
I just tried the patch and it didn''t work for me. It starts loading other websites that are on http://web.archive.org/. It also is downloading different dates for the website that I want. The website I want to download has about 20 different dates that the site was saved on, on the wayback machine. I only want to download the website with this, 20071110043408, date and time.

To run the patched version I just subsituted the exe patch file with the original one that was already in the offline explorer pro folder on my computer. I then started the program from the exe patch file.

I used default settings except for downloading all files and load files only within the starting server. Are there specific settings that you used to get it to work, along with the patch?

Thank You for the help.
Oleg Chernavin 11/03/2008 04:09 pm
Yes, there are many dates for this site. However it is hard to filter, because every page on hosted in a different "date" directory. Is there any way to see the difference between the dates you want and you don''t want. If yes, I will help you to make URL Filters - Directory keywords.

Oleg.
Bryan 11/03/2008 04:29 pm
http://web.archive.org/web/*/http:/ancoratech.com

Preferably, I would like to be able to download each date seperatly.

Thanks.
Oleg Chernavin 11/04/2008 05:28 am
I think, you simply need to copy a starting date, like:

http://web.archive.org/web/20071110043408/http://www.ancoratech.com/

Create a new Project using it with:

/www.ancoratech.com/

in the URL Filters - Directory - Included keywords list and download it.

In fact, it will load some pages from other dates, because they were not changed since that date. For example, follow the above link and then click SOLUTION in the site menu. You will get a different date for that page:

http://web.archive.org/web/20060911155742/www.ancoratech.com/solution.htm

Clicking HOME from that page gives yet another date:

http://web.archive.org/web/20060826004653/www.ancoratech.com/index.htm

So, this site doesn''t allow to isolate dates at all.

Oleg.
Bryan 11/04/2008 10:08 am
I was able to download around 400 files from the website I wanted. But when I went to open the webpage up offline and browse through it I could not. It kept saying file not found. Also when I opened up the gifs, no picutes appeared. Was this the same for you?

Thanks.
bryan 11/04/2008 11:06 am
Never mind. I was able to get it to work perfectly.

Thank you so much for all your help!!
Oleg Chernavin 11/04/2008 12:49 pm
OK. Great! What was the problem?

Oleg.
Bryan 11/04/2008 12:56 pm
I changed the path to where the new files would download to and I forgot. So i was clicking on the old default.htm file from one of the previous runs of it. Complete user error on my part.

Quick question. When I put in the /www.ancoratech.com/ in the URL filters -Directory filters to have it included, do I put a check in the check box where it says "load only files within the starting directory"? I ran it twice so far and I did not put a check in that box and its worked great. I was just curious if thats what you were doing, or if you were leaving it uncheck also.

Thanks again. Amazing customer service!!
Oleg Chernavin 11/04/2008 01:21 pm
Yes, it is necessary to uncheck it, because when you click any link, the server redirects you to another directory with another date.

Oleg.
Bryan 11/04/2008 01:28 pm
That makes sense.

Thanks again for all the help.
Oleg Chernavin 11/04/2008 03:20 pm
You are welcome!

Oleg.
Bryan 11/04/2008 04:31 pm
Oleg,

I just got an error that said "List index out of bounds(3)" when I was downloading a website. The download didn''t stop or anything like that. I clicked OK and it just continued capturing the website. What does the error mean? I wasn''t sure if it had anything to do with the patch which is why I am asking in here instead of making a new topic about it.

Thanks.
Oleg Chernavin 11/05/2008 02:14 am
Did it happen during the download or when performing any other operation? If during the download, how many files (approx) it downloaded before that error?

Oleg.
Bryan 11/05/2008 01:52 pm
It happened during the download. I was not performing any actions when it occured. It was about 2,000 files into the capture.

Thanks.
Oleg Chernavin 11/05/2008 03:03 pm
If it happens again with some project, I would ask you to tell me the Project settings, so I could reproduce them.

Oleg.
Bryan 11/05/2008 04:49 pm
I will do that. Thanks.
Bryan 11/06/2008 04:25 pm
Hello Oleg,

I am getting an error. The application is shutting down by itself after running for awhile. Besides capturing the website, I was not performing any other actions. It just recently shut down after capturing around 63,000 files from one website. In the event viewer in Windows Server 2003 64bit I am getting this as the error:

Faulting application OE.EXE, version 5.2.0.2860, faulting module kernel32.dll, version 5.2.3790.4062, fault address 0x00022366.

Let me know if you need any other information.

Thank you.
Bryan 11/07/2008 10:21 am
Oleg,

Just got another errror. This time it was just a popup where I had to hit OK. It didn''t stop the program at all. It continued to run after I hit OK too. The error was:

Access violation at address 006F427B in module ''OE.exe''. Read of address 6C0F592C.

Thanks.
bryan 11/07/2008 10:47 am
Thank you.
It does. It consumes about 178,980K of memory. In the status bar it says "downloading" but it flashes "parsing" but it only is parsing 1 or 2 things. The only time I''ve seen it parsing alot of messages was when I''ve used the option "Download missing files".

I will try unchecking the "Use alternative connections method" box.
Oleg Chernavin 11/07/2008 10:53 am
This parsing number is correct. 180 megabytes is a high amount, but it is not unusual, because lot of memory is used for the map of the Project and for the internal lists to enhance performance, not download the same links again, etc.

Oleg.
Bryan 11/07/2008 04:19 pm
I am getting a new problem now. I am unable to download a protected forum. I am registered to the forum. I have the webpage up and I am logged in and can browse the forum. I put my username and password in the password section of OE. But when I go to download the page, I just get the login page. Need help.

Thank you.



Oleg Chernavin 11/08/2008 04:40 am
It is easy to download this type of site.
You need to browse to the logon page of the site using the internal Browser of Offline Explorer Pro .

If you need to download the site immediately and only once, you can proceed with the logon and begin downloading the desired pages using Offline Explorer Pro . The program will use the session cookies of the logged on site from the internal browser.

You can also record the logon form contents in a Project, so that Offline Explorer Pro will know how to log itself on whenever you wish to download the site. This is useful when you want to schedule the site download or perform it later, or if you want to update the downloaded site in the future.

Once you have entered your username and password on the logon page in the internal browser, press and hold the Alt + Ctrl keys on your keyboard, click the Logon (or Submit) button in the Web form and release the keyboard buttons. You should get a new Project that contains the Web form information recorded in the URL field.

Adjust the Project settings as you wish (set the Level and other parameters) and click the OK button to save the Project. You may begin downloading at any time.

It is also recommended to make sure that the "Use alternate connection method" box is checked in the Internet tab of the Ribbon.

Note: The form recording method is supported only in the Pro and Enterprise editions of Offline Explorer.

Oleg.
Oleg Chernavin 11/08/2008 06:02 am
Can you please also download the updated version here:

http://www.metaproducts.com/download/betas/oep2869.zip

Unzip the file and replace the old oe.exe file with the new one. This may
help to improve memory usage.

Oleg.
Bryan 11/11/2008 12:57 pm
I downloaded the patch and am using it now. It seems to be helping, but I will test it further and let you know later today or tomorrow how it holds up.

Also, I did both methods you told me for capturing a protected forum and neither of them worked. I tried both methods on one forum and it just keeps capturing the website like its not logged onto the forum so I don''t get much except the login page.

I tried it on another forum and when I open up default.htm to see the website, it won''t load. Says Internet Explorer cannot display the webpage. I''ve never had this problem before, I''ve always just had the above problem about just getting the login page.

It seems like when I log into the webpage through the internet browser associated with OEP I am able to browse the site with no problems, but right when I start the capture of the site, I get logged out of the website in the browser. This could posibbly be why I keep on only capturing the login page or the home page but without being logged in.

Thanks!
Bryan 11/11/2008 12:58 pm
oh and I do have a license for OE Pro.
Oleg Chernavin 11/11/2008 01:28 pm
Can you try to add the following line o the URLs field of the Project:

IgnoreLogoutLinks

Would it help? If not, send me the details to support@metaproducts.com.

Oleg.
Bryan 11/11/2008 01:37 pm
> Can you try to add the following line o the URLs field of the Project:
>
> IgnoreLogoutLinks
>
> Would it help? If not, send me the details to support@metaproducts.com.
>
> Oleg.

So I would just add this line after the URL address, in the same box?
Oleg Chernavin 11/11/2008 01:58 pm
Yes, this box is multiline - so this should be on a separate line after the URL. It will become bold once you enter it.

Oleg.
Bryan 11/12/2008 11:26 am
I was able to download everything on the forum using ignorelogoutlinks.

When I go to open the file default.htm that takes me to a page that has the website link on, which I would usually click on to open the website offline, does not work. Basically says Internet Explorer Cannot Display Webpage. I was able to circumvent this by finding another default.htm file. This one opened directly to the webpage. I was then able to browse offline without any issues.

When I downloaded another forum, the same thing happened with the first default.htm file. However, there were no more default.htm files for me to open. So essentially I cannot browse the webpage offline.

The first default.htm file used to always work and let me browse the website offline, but now it doesn''t seem to work with the forums. Is there a way to resolve this?

Also, when I tried downloading 2 of the forums at once, I got an "Invalid Pointer Operation" error. Downloading one at a time didn''t cause this error.

The patch you had me download works great though. I only get about 50K memory usage now with it. No issues with it shutting down.

Thanks!
Oleg Chernavin 11/12/2008 12:19 pm
It is easy to browse the site inside Offline Explorer. If you want to open a file directly from the disk, find default.htm file in the download directory. It will contain links to all starting pages of your Projects.

An easy way to see the location of Download Directory is to select a Project, click Properties button and press Ctrl button on keyboard. The path will show in the lower-left corner on the dialog.

Another way is to select a folder that contains the Projects and click Properties to see the corresponding setting.

Oleg.
Bryan 11/12/2008 12:28 pm
The default.htm is opening but the link to the starting page of my project is not working. Internet Explorer is giving me the error, "Internet Explorer Cannot Display the Webpage". So I am unable to browse the webpage offline.
Oleg Chernavin 11/12/2008 12:38 pm
Can you give me the URL of the starting page, so I could download it and see why it happens?

Oleg.
Bryan 11/12/2008 12:42 pm
http://www.easyadforum.com/login.php
Oleg Chernavin 11/12/2008 01:21 pm
I think, I will need the password. I assume this problem happens because you have POST request in the URLs field which passes the username/password to the forum.

If you want, you can send it to support@metaproducts.com.

Oleg.
Bryan 11/12/2008 02:22 pm
Would you be able to just make your own username and password? Registration is simple and free. Thanks.
Bryan 11/13/2008 04:04 pm
Oleg,

I emailed you a username and password to use. Did you get it?

Just wanted to make sure.
Oleg Chernavin 11/14/2008 05:27 am
I am sorry, I was unable to fix it yet. Please excuse me for the delay.

Oleg.
Bryan 11/14/2008 09:04 am
Thanks for the update Oleg.

It''s alright. I at least know that you got the username/password I sent you and that you are working on it.

Thanks again for all the help.
Oleg Chernavin 11/14/2008 10:50 am
OK. I made a change and it should now correctly make the default.htm paths. I just released the new version:

http://www.metaproducts.com/download/opsetup.exe

Oleg.
Bryan 11/14/2008 01:16 pm
Thank you so much Oleg!! The new version works great!

Not only is this product great, but it continues to get better thanks to your excellent customer support.
Oleg Chernavin 11/14/2008 01:28 pm
It is thanks to our customers! Because I develop Offline Explorer with their help. Without their suggestions and reports I would have nothing to develop.

So, this is a win-win cooperation.

Oleg.