Odditity in forum download

Eric Hart
06/21/2009 06:41 pm
Hi,

I am evaluating Offline Explorer 5.5. A major use for me is in downloading the contents of user forums (boards like this one!)

The first board I tried came up with an oddity: all posters are shown with the same avatar.

For example: go to this site in IE: http://community.acdsee.com/forums/topic/recent-widespread-rumors

You will see that the first and second poster each have their own different avatar.

But when I download this site with Offline Explorer, both posters are shown with the same avatar (the second one, as it happens). I did the download by right-clicking "download this page" and "download this link" in IE 8.

Please let me know if you see this same problem and if there is an explanation.

I would also like to hear if many people are using the program for forum exploration, and whether the PRO version is important for this use.

Cordially,

Eric Hart
Oleg Chernavin
06/22/2009 07:34 am
Can you please install Offline Explorer Pro version:

http://www.metaproducts.com/download/opsetup.exe

Add the following line to the Project''s URLs field after the starting URL:

Additional=SkipDisposition

I am sorry for this issue. It happens when web server supplies alternative filename and this name is the same for many images/files on a page. Such situation is rare, but I am thinking on how to make an intellectual automatic check for such collisions in future.

Best regards,
Oleg Chernavin
MP Staff
Eric Hart
06/22/2009 06:09 pm
Thanks Oleg, I appreciate your quick responses here!

Some more questions for you:

1. Is the Pro version required in order to deal with this issue? If so, that helps me answer my question about standard vs. pro!

2. I think I have found two bugs in OE, but perhaps I misunderstand. The first is that the "find all terms" checkbox is ignored when doing an indexed search. This means that *all* indexed searches us an implied "or" (I usually want the other behavior). The second is that the "levels" checkbox in the "view" menu seems not to work. The former is the only one that''s important to me -- can you let me know if that is fixed in Pro?

3. Finally, I am finding that efficiently downloading forums is something of a project. I really like the filtering capabilities, particularly the ability to "test" a URL. Even so, fine-tuning it is taking me some time. Is there a set of guidelines in the manual or on the forums specifically to help people in downloading forum content? The main problem being that the same content gets downloaded in multiple ways because there are lots of ways to get to the same content, so lots of different links that go to the same places.

Cordially,

Eric Hart

> Can you please install Offline Explorer Pro version:
>
> http://www.metaproducts.com/download/opsetup.exe
>
> Add the following line to the Project''''s URLs field after the starting URL:
>
> Additional=SkipDisposition
>
> I am sorry for this issue. It happens when web server supplies alternative filename and this name is the same for many images/files on a page. Such situation is rare, but I am thinking on how to make an intellectual automatic check for such collisions in future.
>
> Best regards,
> Oleg Chernavin
> MP Staff
Oleg Chernavin
06/23/2009 05:54 am
1. I am sorry, but yes, Pro edition is required.

2. Thank you! I fixed that. It was only on the standard edition. Pro edition didn''t have this issue.

3. Sorry, it is true that fine-tuning downloads is really tough. You have to look through the links on the pages you want to download, etc. We are working on a visual technology to do this, but it is still incomplete.

Meanwhile, you may describe (in details) what you need to download and I will be here to assist you.

Oleg.
Eric Hart
06/29/2009 08:13 pm
> 1. I am sorry, but yes, Pro edition is required.
>
> 2. Thank you! I fixed that. It was only on the standard edition. Pro edition didn''''t have this issue.
>
> 3. Sorry, it is true that fine-tuning downloads is really tough. You have to look through the links on the pages you want to download, etc. We are working on a visual technology to do this, but it is still incomplete.
>
> Meanwhile, you may describe (in details) what you need to download and I will be here to assist you.
>
> Oleg.

OK, Oleg, here you go!

Isanitized_by_modx& #8217ve been trying hard to get this forum to download the way I want. The postings on the first page download fine. The problem is in downloading postings on subsequent pages. My latest attempt (unsuccessful) has been to specify pages 2, 3, 4, etc. individually as URLs. I''d prefer to do it based on a single URL (so I''d get all the pages, not just the ones I specify) but not sure how to do that without getting lots of "looping" downloads of pages. In any case, the links on the first page work fine, but the second and later pages just say "Document not found. This page is not available offline. Etc." (Actually a few pages are found, but I''m pretty sure these are cached from previous nearly successful attempts).

Heresanitized_by_modx& #8217s my setup:

Addresses (URLs):
http://forums.sandisk.com/sansa/board?board.id=sansafuse
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=2
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=3
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=4
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=5
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=6
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=7
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=8
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=9
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=10
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=11
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=12
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=13
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=14
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=15
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=16
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=17
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=18
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=19
http://forums.sandisk.com/sansa/board@board.id=sansafuse&page=20

Level limit: 2 (I think 1 should actually be enough in this setup).

Download only new and modified files.

Under file filters, all file types are checked.

Under URL Filters, the only filter type I''m using is "Filename", and I have three entries there:
board@board.id=sansafuse
message?board.id=sansafuse&thread.id=
message@board.id=sansafuse&thread.id=

Everything else is set to defaults (I tried to simplify this last attempt).

I appreciate any help!

Cordially,

Eric Hart
Oleg Chernavin
06/30/2009 07:30 am
I think, you need to change it this way:

URLs:
http://forums.sandisk.com/sansa/board?board.id=sansafuse

Level limit: unlimited (uncheck it) (to follow 2 3 4... pages and posts on them)

Download only new and modified files.

URL Filters - Filename - Included list:

board?board.id=sansafuse
message?board.id=sansafuse&thread.id=

Oleg.
Eric Hart
07/02/2009 11:19 pm
I''m trying this now. I did something like this before, and it seemed like it "looped" and downloaded the same pages multiple times. it seemed the site referenced the same pages in different ways. Not positive, though, so I ''m trying again. After first try, I restricted to text only, and only from starting directory down (I can''t download a bunch of images -- too little time in the Internet cafe).

Question: Is there a way to "login" to the site? it''s not required, but this would read my site preferences, which are to put a lot more items on each page (more efficient, I think, than lots of multiple pages). The site doesn''t pop up a login prompt, so to do this, I wojuld have to be able to tell OExplorer what the login page URL is.


> I think, you need to change it this way:
>
> URLs:
> http://forums.sandisk.com/sansa/board?board.id=sansafuse
>
> Level limit: unlimited (uncheck it) (to follow 2 3 4... pages and posts on them)
>
> Download only new and modified files.
>
> URL Filters - Filename - Included list:
>
> board?board.id=sansafuse
> message?board.id=sansafuse&thread.id=
>
> Oleg.
Oleg Chernavin
07/03/2009 07:14 am
Downloading same URLs multiple times is impossible - Offline Explorer controls that. However if the same pages have different URLs, then it is possible. For example:

http://forums.sandisk.com/sansa/board?board.id=sansafuse&page=12
and
http://forums.sandisk.com/sansa/board?page=12&board.id=sansafuse

In most cases will result in the same page, but since URLs are different, Offline Explorer would load both of them. However this is not the case on most sites.

To simplify the download task, I could suggest you to use Offline Explorer Pro with the starting URL:

http://forums.sandisk.com/sansa/board?board.id=sansafuse&page={:1..170}

This way you can set Level to 1.

Logging on the site is easy - just logon it online in the Internal browser and the download will use the same cookie.

Oleg.