Help, please- downloading website

Author Message
Defat 12/24/2011 01:33 am
Hi, I'm hoping someone can help me figure out how to get a series of images of www.familysearch.org. I'm currently using the trial version of OOE and if we can get this figured out I will soon be a very happy customer!

These are examples of the urls that are giving me trouble. OOE doesn't find the url in the pages source (I got it from right clicking and selecting copy shortcut on the webpage.

Image I'm wanting to save from site:

https://familysearch.org/search/image/save.jpg?url=https%3A%2F%2Ffamilysearch.org%2Fpal%3A%2FMM9.3.1%2FTH-1-16091-34723-90%3Fcc%3D1851040%26wc%3D11692906


The link to the next page (with next image) on the website
https://familysearch.org/pal:/MM9.3.1/TH-1-16091-34734-5?cc=1851040&wc=11692906

I am not very experienced yet with OOE or html.

I've been reading help and some of the messages on this forum, but haven't been able to figure it out. The links above are not in the page source and OOE doesn't see seem to be able to figure them out either.

Here is a link to the first page in the series I am trying to download:
https://familysearch.org/pal:/MM9.3.1/TH-1-16091-34723-90?cc=1851040&wc=11692906

I currently have the following commands in the project:

Additional=DepthFirst
Additional=ParseIncludedScripts
Additional=SkipDisposition

On the parsing page: Evaluate script calculations.

Anyone have any ideas or help, please? Really I don't care about the pages, other than I need OOE to get to the next page so it can sex the next image and so on.

Sure the images are there and free, but they are not indexed and I would like to have them in thumbnail format so I can skip around years and months documents for more quickly.

I could use some help, I thank you in advance,

Defaut
Oleg Chernavin 12/25/2011 10:39 am
Yes, the site is not easy to get images from. The scripts are quite complex. However the URL of the page and the image are very similar. You could use URL Substitutes (Properties dialog - Parsing section) to get them.

Add the rule:

URL:
https://familysearch.org/pal
Replace:
?cc=
With:
.jpg?cc=
Apply to:
URLs

Would this help?

Best regards,
Oleg Chernavin
MP Staff