Dowloading Forum Thread question...
|VoodooChile||09/13/2009 12:55 pm|
I have a pretty typical request but cannot seem to find an answer on the forum...
I am trying to download a specific thread in a forum where I want to do the following:
1 - Download the main page I specify (Page 1 of the thread ONLY).
2 - Download the images on that page (which are linked to a specific site and exclude others)
3 - NOT download the other links on the page, which are to other subsequent thread pages (page 2, 3, 4...) and other sites.
The issue I am having is -
Q - How do I specify to download image content from only *ONE* external site(ex: Imageshack) and *EXCLUDE* the others (say: tinypic and photobucket)...The page has images linked several image sites and I am only interested in content from Imageshack...In fact the best way would be to exclude every other site BUT Imageshack and I cannot find a way to do it...Also, I want the Imageshack depth to be 2 as the link on the main thread page is a thumbnail and I want OEE to go one level deeper and download the actual Imageshack page that displays that image in its actual size)
Q - I also do not want to download the other pages that are linked to this page...as you know most forum threads have links to subsequent pages like page 2, 3, 4, 5 etc...I want it to not follow the links further...I only want to download the page I am on and its associated content in terms of images that are tied to a single site only and exclude the rest.
Hope I made sense...I am more than happy to explain further if what I stated did not make sense.
Thanks in advance
|Oleg Chernavin||09/14/2009 12:48 pm|
|I think, this is easy. Setup a Project with the forum page URL, Level=1, make sure that all File Filters have "Use URL Filters settings" in Location fields. Add the imageshack.com as Included server keyword in URL Filters - Server section and uncheck the "Load only from starting..." box. Uncheck "Load from starting directory" box in URL Filters - Directory. This should do the trick.
|VoodooChile||09/21/2009 02:33 am|
|Thanks Oleg...I tried your suggestion and we are close but not quite...Here is the issue:
The page is correctly downloaded with ONLY the text from my main page (page 1 of the forum) and ONLY the thumbnail images from Imageshack. What is critically lacking is the page whose address is linked to the thumbnail image.
The issue is that that larger image that is linked to the thumbnail from imageshack is not downloaded. If you read my original post, I had stated that the main forum page has thumbnails of images that contain links to the actual full size image...and that second level imageshack page is not being copied (which btw I want to only download that image and text on that page only and not the other associated links...no sense going deeper there).
So think of it this way in terms of pseudo HTML -
MainPage HTML contains:
<thumbnail image - location is for example: http://www.imageshack.com/thumb/95wiT.jpg, and linked to address like this: http://www.imageshack.com/view/95wiT>
<Text>Additional text on the page</TEXT>
<END OF PAGE>
Now what is happening is that the thumbnail image is in fact correctly being downloaded and shown BUT the address linked to the thumbnail http://www.imageshack.com/view/95wiT is not being followed and the associated page is not being downloaded and I am not getting that crucial page with the full size image is not...all I have currently is the thumbnail image on my main forum page and when I click on the thumbnail I get the error message by the OEE browser of "Document not found" and that the page is not available offline. Interestingly the error message by OEE browser actually gives me the actual link to the page (that did not get downloaded) and when I click on it, it takes me to that very page with the full size image on Imageshack.
Hope this pseudo HTML made it easier to illustrate my point.
|Oleg Chernavin||09/21/2009 09:24 am|
|I think, I need to see the actual page to advise on the settings? Can you give me access to it?
|VoodooChile||09/23/2009 01:59 am|
Can you please send me an email offline at firstname.lastname@example.org so I can send you the link to the page...It is, shall we say, of a mature variety(do not worry, its not porn) and I do not want to post it openly on the public site...
|Oleg Chernavin||09/23/2009 02:01 am|
|OK. I just sent you a letter.
|Rio||10/08/2009 11:31 am|
i've same question as described above..
I'm going to download this thread of forums..
but don't know the setup..
these the forum address http://www.forexindo.com/forum/analisa-teknikal/133-trading-style.html
and next page of it>> http://www.forexindo.com/forum/analisa-teknikal/133-trading-style-2.html; http://www.forexindo.com/forum/analisa-teknikal/133-trading-style-3.html and further..
I'm going to download those page with pict and another attachment.., but not going to download another links (somehow, its added on members signature)...
i've try many type of Site Offline explorer..
and wonder if the MP OEEE will have generous feature to do this..
sorry for unwell english..
|Oleg Chernavin||10/08/2009 11:37 am|
|This should be simple. Please use the starting URL:
(change the 100 number to the last actual page). Set level to 0 and download it.
|Rio||10/08/2009 12:23 pm|
|thanks for the hints..
but it cant be saved into chm..
and, even it can be.
i have another problem..
when i'm trying to view the file result on firefox.., its link not linkage eachothers.. >> i'm open 1st page, than i'm prefer to go to next/jump page by clicking the page navigator.. but i'm checking that each other link were not link to another page of downloaded page..
can i have these abiliy..
so, i get powerful chm files result.
|Oleg Chernavin||10/09/2009 05:15 am|
|I downloaded a Project with the URL above, exported it to CHM and the links to 1 2 3... pages worked well in the HTML Help file.
|Van dermanser||10/20/2009 02:46 am|
|i'm going to download another forum pages..
but, the result pages didnt get the images from the pages...
the pages is html
such address: http://www.forex-tsd.com/manual-trading-systems/8134-kg-trapping-mode-5.html
any suggest sets?
|Oleg Chernavin||10/20/2009 04:14 am|
|Logon the site in the Internal browser before the download. Then set Level to 1 of the Project and download.