Cannot extract articles linked to outside webpages

Author Message
Aliv 01/15/2009 06:19 pm
I would like to download all the articles from this website:

http://www.benjaminshapiro.com/

When i run the download process, it appears to save everything on the website, but the articles which are archived, which is most of them, are not saved. Most are linked on other websites.

Most articles are linked to a website called townhall.com, however the program is not extending outside to capture those articles on outside websites. Also sometimes the articles are long and require the reader to click ''next'' or ''page 2'' to continue the article. This also needs to be done automatically.

My settings are level limit to 3 and i selected load from any location.

Please advise.
Oleg Chernavin 01/15/2009 07:04 pm
I looked at the site and Level=3 should be enough. However if you need to follow other links on these sites, then you will need higher level. But this will cause too many unwanted links followed.

Perhaps, a kind of semi-automatic approach would be better here. I would suggest you to select the Project that partially downloaded the site, click the AutoSave button on the Internal Browser toolbar and then click Browse. Then browse the articles and click Next Page buttons or links. Offline Explorer Pro should download missing files and add them to the Project.

Best regards,
Oleg Chernavin
MP Staff
Aliv 01/19/2009 05:20 pm
> I looked at the site and Level=3 should be enough. However if you need to follow other links on these sites, then you will need higher level. But this will cause too many unwanted links followed.
>
> Perhaps, a kind of semi-automatic approach would be better here. I would suggest you to select the Project that partially downloaded the site, click the AutoSave button on the Internal Browser toolbar and then click Browse. Then browse the articles and click Next Page buttons or links. Offline Explorer Pro should download missing files and add them to the Project.
>
> Best regards,
> Oleg Chernavin
> MP Staff
_____________________________________________________________________________________
Thank you for the response.
I am trying the extraction on a different page with many more articles, hosted on different sites, such as nytimes, herald, cnn etc. The method you presented does work. I opened my project and browsed in the internal browser and went to the missing links and i started clicking and loading them. The problem is, it would take far too many many hours to sit and click and wait for each missing link to load. There seems to be hundreds, esp on the new site im trying. I would like to set it to go outside the main site, but only to the 1st page of the article on the outside sites, even if there is a page 2 or 3, i just want the first one. Can you please advise what the best method would be for extracting. How can i set Offline Explorer to go DL everthing and also the articles on outside servers, but just the 1st page. Im not to concerned about file size so long as the entire site can be viewed offline. Thank You. The new site is judicial-inc.orgg
Thanks

Oleg Chernavin 01/20/2009 06:34 am
If so, just use Properties dialog - URL Filters - Server section. Check "Load only from starting server" and also check "Load up to 1 links on other servers".

Oleg.
thebigfatgeek 07/13/2009 01:45 am
Hi there

I use Google Reader to browse my RSS feeds, and when I find an interesting article/topic I "star" the topic for later retrieval. I would like to use OE to download the starred topics and attachments associated with those starred topics. They are normally 2 levels deep, ie:

Google Reader -> Topic Detail Site -> Attachment Site

where Topic Site and Attachment site are different from each other. I only want it to download in this vertical fashion, and not at all horizontally across each level

Any ideas how to achieve that?

Oleg Chernavin 07/13/2009 08:57 am
I think, you can simply give Offline Explorer a URL of the RSS feed and let it download with Level=1 or 2. It understands RSS and can also convert it to HTML for easier offline reading.

Oleg.