Problems downloading from AccessMedicine

Author Message
Kevork Kevorkian 02/07/2005 02:00 pm
It seems the site
http://www3.accessmedicine.com/home.aspx
has switched again to ASP which together with the JavaScripts have shattered my attemps to download a separate book or anything useful at all. Perhaps with some expert help downloading will be possible.

1. As all the pages are in one an the same directory, if I want to download Goodman & Gilman`s Pharmacology textbook for example, which separate documents are entitled "content.aspx?aID=410000" ... "content.aspx?aID=557397", what is the best way to do it? I used the following syntax in the included files keywords in the URL filter:
content_main.aspx?aid={:410001..557397}
I`m aware each content_main file is linked to 3 separate iframe files content_nav, content_main and content_related aspx files respectively. I could repeat the procedure for these files too.
2. With or without any file filter restrictions, OE downloads successfully everything up to the container of the 3 iframes. Their sources are translated erratically (on one occasion it downloaded same content in all the queued content_main files) and at random and as a consequence the main page shows either nothing or wrong content. Something similar happens when I`m browsing the site through the OE internal Browser. Interestingly, when I try to change the source of the middle (contentmain) iframe manually, the parent page shows nothing there - perhaps the JavaScript has something to do with it.
3.There is again the log in problem. Is there a way for OE to detect the log in error - by the text displayed on the page or by the filesize for example and in that case to attempt downloading those files later or log in again automatically. And is there a way to remove these "wrong" log in pages from an already existing project?
4. Is there a way, in case downloading the textbook by OE is not possible, to import files from Internet Explorer cache to an OE project?
My user name, which will be active for just only a couple of days is Vasalius and the password cardiopc. It`s easy to obtain a new trial password (like mine) through a valid e-mail address.
Oleg Chernavin 02/08/2005 08:31 am
The easiest way is to have the following starting URL:

http://www3.accessmedicine.com/resourceTOC.aspx?resourceID=28

Set Level to 3, add

contents

to the URL Filters | Filename | Included filename keywords list.

To make sure that logon error pages are not loaded, use Content Filters.

You can also search the Project for unwanted words using Edit | Find Contents feature and delete these pages.

Best regards,
Oleg Chernavin
MP Staff
Kevork Kevorkian 02/10/2005 02:55 am
Thank you for your reply, Oleg!
Yours contents must stand for the common first part of the files to be downloaded, right? In that case it should be content without the "s".
Unfortunately I could not configure my project not to download application error and log in pages, which are encountered very often.
On the extremely rare occasion when all page elements are downloaded, their contents is inappropriate - most commonly one and the same contents appears on many pages.
I also cannot open the javascript windows from the internal browser.
Oleg Chernavin 02/10/2005 07:44 am
> Yours contents must stand for the common first part of the files to be downloaded, right? In that case it should be content without the "s".

Yes, you are correct.

> Unfortunately I could not configure my project not to download application error and log in pages, which are encountered very often.

Maybe Offline Explorer

> On the extremely rare occasion when all page elements are downloaded, their contents is inappropriate - most commonly one and the same contents appears on many pages.
> I also cannot open the javascript windows from the internal browser.

Perhaps, the server doesn`t like simultaneous access when several pages are being downloaded at the same time. Please try to set 1 number of connection in the Options dialog and redownload the Project.

Oleg.
Kevork Kevorkian 02/10/2005 02:01 pm
Setting the number of downloaded pages to one at a time makes the login error chance lesser, but I still have one and the same content in all the pages. It appears that this content is not chosen at random but rather depends on the pages I have browsed prior to starting the download.
Oleg Chernavin 02/12/2005 09:41 am
I tried to request a trial logon on the site, because yours is already expired, but for some reason they haven`t sent me an E-mail yet.

Oleg.
Kevork Kevorkian 02/12/2005 03:21 pm
Yes, they made it hard with their comtinuous changes. With the password below I truly hope you`ll find a solution:

Username: Cardiopc
Password: arvense

Oleg Chernavin 02/13/2005 03:40 pm
The password I requested still hasn`t arrived.

I am sorry, but I will be out of the office for few days until Friday. Will the new password be valid that time?

Oleg.
Kevork Kevorkian 02/14/2005 03:31 pm
I think there is only one free password for a certain e-mail address. My password will be valid till Saturday, I think. If you have troubles with logging in, ie server application error, please use the following URL:
http://www3.accessmedicine.com/login.aspx
Steve 02/23/2005 12:43 pm
Hi

I could download from the site earlier. But not anymore
Is there anyway?
Oleg Chernavin 02/24/2005 12:56 pm
Steve, I am going to try again with this server. My temporary password expired, so could you please give me yours?

Oleg.
Kevork Kevorkian 02/26/2005 03:18 am
Hello Oleg! Here is another 7-day log in information:

Username: moqkutiq
Password: resistance

Steve, even with the older version of the site, there was a problem with downloading of the pop up windows. Do you have any downloads from the previous version of the site?

> Steve, I am going to try again with this server. My temporary password expired, so could you please give me yours?
>
> Oleg.
Oleg Chernavin 02/28/2005 07:59 am
It looks like loading too many content_main.aspx?aID=xxx files simultaneously confuses the server. Please try to go to the Options dialog and set 1 number of connections and the following in the Delay between downloads field:

3-5

Click OK button, logon the site online in the Internal browser. Then select the Project, go to the Project Map, select all content_main.aspx?,... files in the Map using Shift+click and click the Download button to redownload these files only.

This will take some time, but at least this way should work.

Oleg.
Kevork Kevorkian 03/02/2005 04:10 am
Thank you very much for taking your time and helping me, Oleg! This time at last OE loads content_main pages!
As some of JavaScript source files are not translated correctly, is there a way to import modified by myself files into an existing project? Or "download" into an (existing or a new) project html documents and related to them files from a local hard drive?
Best regards:
Kevork
Oleg Chernavin 03/02/2005 04:22 am
You can locate the desired file in the Project Map and click Tools | Edit to change them and then save to the disk. Anyway, can you please tell me more, which scripts were not translated properly. Perhaps I will be able to fix this.

Oleg.
Kevork Kevorkian 03/02/2005 04:52 am
P.S. If I`m to change the contents of an already downloaded by OE file manually (e.g. with the Windows NotePad program), would this affect the project settings i.e. file modification check, map info, project backup?
Is there a way to import new URL`s in the queue list and for example add there the path to a missing graphic file and download just these new files selectively?
Kevork Kevorkian 03/02/2005 05:04 am
Thank you for your prompt answer, Oleg, I`ve just seen it.
My question was invoked while downloading
http://genetics.accessmedicine.com

user name: vesaliusandreas@yahoo.com
password: aaaaa

http://genetics.accessmedicine.com/mmbid/public/co_contents/toc.html
Above is the full TOC. The brown and gray graphic buttons represent rollover images. An external mmbid_rollovers.js is used to control the rollover. Interestingly, at the main TOC everything is translated correctly, while on the individual chapters, source properties point to another directory.
Kevork Kevorkian 03/02/2005 05:22 am
> Interestingly, at the main TOC everything is translated correctly, while on the individual chapters, >source properties point to another directory.


I guess there is a conflict when translating absolute to relative paths, so the relative path works correctly only with the TOC page.
What I was tryiung, was to cteate additional duplicate all_images folder at the proper level, or now change the script and html source (or use absolute paths in the js file as I`m going to export the project to a CD). But in this way I`m not sure whether everything will be remembered if I make a backup file of my project.
Oleg Chernavin 03/02/2005 07:54 am
> P.S. If I`m to change the contents of an already downloaded by OE file manually (e.g. with the Windows NotePad program), would this affect the project settings i.e. file modification check, map info, project backup?

You can easily change any downloaded file. It will not change any Project settings. File modification date is being kept in descr.wd3 files.

> Is there a way to import new URL`s in the queue list and for example add there the path to a missing graphic file and download just these new files selectively?

You can add any number of URLs to the Project URLs field.

Oleg.
Kevork Kevorkian 03/02/2005 03:10 pm
> You can add any number of URLs to the Project URLs field.

And what about files on my local disk?
qdf 03/03/2005 04:24 am
Im encountering the same problem... but I have found something useful which can help
the url www3.accessmedicine.com/popup.aspx?aID=410001&print=yes
will allow to download just 1 files including the images and everything in its printing version..
however even if I use the internal browser and login..I still dont succeed in downloading these
urls (from 410001-55....)

I`d be happy to hear your thoughts/ideas/syntax/tips on that.
qdf 03/03/2005 04:31 am
Im interested in extracting the files from OE and I will write a program to strip
all the HTML headers and concat the files to a large html file which I will convert to PDF..

But for downloading the files , I need to know how to tell OEE to export the files to a certain directory with file name XXX.html where XXX is the same as the aID used in this site..

10x again.
qdf 03/03/2005 06:05 am
OK..I have partially succeeded...
I have downloaded all the files...using OE {xxx..yyy} from popup.aspx, but no images were downloaded...what do I do?
Oleg Chernavin 03/03/2005 07:13 am
What are your Project settings. Regarding HTML files rename - you can use URL Substitutes feature for that in the Project Properties dialog | Advanced section.

Oleg.
Jo 11/06/2005 05:27 pm
hi Oleg.

Your software is fantastic. I struggled for days to get Blackwidow to get anything useful off the accessmedicine site, and gave up. Then I stumbled onto offline exlporer, and after 10 minutes i was getting results. Unfortunately after close inspection of the ~6000 files that make up one textbook, i realised that i had actually downloaded about 25 files (6000 different filenames, but only 25 different contents in the fiels), but many copies of the same. It seems that the server cannot handle being hammered with requests, and sends the right filename with the wrong content.
I slowed it right down, as you suggested in the beginning of this post, and it helped, but there are still quite a few duplicates.
Any ideas on what i can do, otherthan forcing only 1 connection at a time and wait 15-60 seconds between requests, as this will take close to 24 hours for each textbook.

thanks
jo
Jo 11/07/2005 04:46 am
hi Oleg,

some more thoughts regarding how I can solve the problem of identical content in files with a different name. If i run duplicate detection software such as noclone www.noclone.net on the directory that OE has downloaded the files to, i can easily see which files are identical, and redownload those.
Problem is that there are about 1200 that need to be redownloaded and they are not sequential, so i need to manually select them.
Noclone can export a CSV file format of the duplicates it detects.
Is there a way of feeding a list of files (such as from the CSV file) that already exist in the project map, into OE, to redownload.
Perhaps using the command line?

thanks

jo
Oleg Chernavin 11/07/2005 05:32 am
You can easily copy these URLs to the clipboard and then paste them into the Project`s URLs field.

Oleg.
Jo 11/13/2005 12:16 pm
Oleg,
I hope you can help, as I a have almost given up.
In an attempt to prevent the problems with the ASP server delivering the wrong content of file, in the correct file name, i have just downloaded an entire text book, about 3500 files and 73MB over 3 full days. This is because i set "number of connections" to 1 and "delay between downloads" to 7-13.
Then i used "URL filters-filenames" and included only "*good11*.gif" and "content*={:935800..956493}" to limit to only this book. The limitting works fine, but I am still getting more than 1000 incorrect files, essentially making the download unusable, as perhaps 60% of the pages in the book are actually the same. The problem only occurs in pages with 3 frames, ie content.aspx, content_nav.aspx and content_main.aspx. Those pages which are only content.aspx because they are just chapter or section headings all download perfectly. Furthermore, on the pages that are incorrect, the content.aspx part does conbtain the corrent information all the time, it is just the content_nav.aspx and the content_main.aspx where the file contents do not correctly download.
Do you have any ideas.
I have just created a new login that will be valid for a week:
joregelt6
lactobacillus

thanks
jo
Oleg Chernavin 11/14/2005 06:59 am
It is very hard to overcome such servers. I would suggest you to try to add the following line to the Project`s URLs field:

Additional=DepthFirst

This will reserse the order of the pages to download and may help. But I am not 100% sure, though.

Oleg.
Jo 11/15/2005 05:12 am
hi Oleg

thanks for the fast response again.
I am not sure where you mean I should enter this text "Additional=DepthFirst"
Is it at the end of the line:
http://www.accessmedicine.com/resourceTOC.aspx?resourceID=28 in the "Project-Address URLs" field, or is it an entry on a new line in this field, or is it somewhere else completely.
Also, can you tell me very quickly, what affect you expect this to have, and why you think i am having these problems.
Is their server set up to try and prevent the kind of downloads that offline explorer does, or is their asp server buggy and that is why i am having these problems.

thanks

jo

thanks
jo
Oleg Chernavin 11/15/2005 06:01 am
This should be placed on a separate line after the existing Project URL, like:

http://www.site.com/page.htm
Additional=DepthFirst

The above will cause Offline Explorer to load a page and then all links (images) from it and only then start loading the next page. I am not sure, but it may help for some sites.

Oleg.
Kevork Kevorkian 12/17/2005 05:55 am
If I paste the list with the links in the Addresses (URLs) field like that:

http://www.accessmedicine.com/popup.aspx?aID=799212
http://www.accessmedicine.com/popup.aspx?aID=799222
http://www.accessmedicine.com/popup.aspx?aID=799235
http://www.accessmedicine.com/popup.aspx?aID=799250
http://www.accessmedicine.com/popup.aspx?aID=799515

Level limit = 0

I download the pages successfully.

Choosing the number of connections to 1 and delay between downloads 1-3 is recommended. After a couple of downloads I switch it to 2 and delay to 0.

To obtain the list with the links to be included, you can download one time with all the wrong pages and arbitrary project settings. From the map tab you can copy the link urls and paste them in a second project with the settings given above. I sorted the list with the links alphabetically before pasting it. Perhaps it should work the other way too.

Hope this will help. I`m waiting for your feedback.