Grabbing Yahoo messages

Author Message
Patrick 03/29/2006 07:02 am
I have been trying to use Offline Explorer Pro to grab the messages from a Yahoo group (no - I am not grabbing from a photoalbum). So I do something like this (for a group called foobar)

http://groups.yahoo.com/group/foobar/message/{:1..500}

(using a directory filter of message)

Now the problem comes in when yahoo inserts a `redirect` when reading messages - it redirects to an ad every so many messages looked at - and give an option to continue on to the message. OEP gets stuck there - it tends to just skip over retreiving that message. I tried changing the scan depth - doesn`t seem to help - it does re0port a 304 error message saying that the object has moved.

Is there any workaround for this ?
Oleg Chernavin 03/29/2006 07:02 am
Yes sure. First, please select Custom configuration in URL Filters | Directory and add to the Included keywords list:

group/foobar

Now select Level=0 and start loading. This should help.

Best regards,
Oleg Chernavin
MetaProducts corp.

George 03/29/2006 07:02 am
> Yes sure. First, please select Custom configuration in URL Filters | Directory and add to the Included keywords list:
>
> group/foobar
>
> Now select Level=0 and start loading. This should help.
>
> Best regards,
> Oleg Chernavin
> MetaProducts corp.
>
>

I tried this and still not all messages are downloaded, for example from http://groups.yahoo.com/group/BardonPraxis/message/{:1..1840} about 100 msgs are not downloaded properly, i.e. they are 180 bytes containing the following information:

"<HTML><HEAD><META HTTP-EQUIV="Refresh" CONTENT="0; URL=../interrupt@st=1&h=360&m=1&done=_252Fgroup_252FBardonPraxis_252Fmessage_252F360"><TITLE>302 File moved</TITLE></HEAD></HTML>"

So I deleted those and then used "Download missing files" and OE got about 80 msgs and the rest 20 were again redirects so I tried the same procedure for those 20 and so on till I got all 100 msgs correctly. Though this works it could be very time consuming with larger message boards.. So is there any other way to get all messages?
Oleg Chernavin 03/29/2006 07:02 am
Well, no - Google inserts Ads instead of some messages you are trying to view. This is why such files appear - Google redirects you to a page with ads.

Oleg.
Franz Muell 03/29/2006 07:02 am
With the old OEP version 2.2.808 I could use the project url

http://login.yahoo.com/config/login
post=.done=http%3A%2F%2Fgroups.yahoo.com%2Fadultconf%3Faccept%3DI%2520Accept%26dest%3D%252F&login=USERID&passwd=PASSWORD&.persistent=y
(level limit 0)

to get all cookies set appropriately to be logged in and be identified as adult.
A second project

http://groups.yahoo.com/group/GROUP1/message/{:NUM1..NUM2}
http://groups.yahoo.com/group/GROUP2/message/{:NUM3..NUM4}
http://groups.yahoo.com/group/GROUP3/message/{:NUM5..NUM6}
(level limit 0, urlfilter all directories)

on first run then retrieved most of the messages, only leaving out
a few where the advertisement page had been inserted (every fourth
message was redirected to an interrupt... url but most of those
redirected back, and those messages were also loaded).
A second run retrieved all the remaining messages.

Later versions of OEP didn`t do it any longer, the automatic login
stopped with "download complete" long before everything necessary
had been touched, every fourth message was missing (those which
had been redirected through the interrupt... url), a second
attempt gave the result "nothing to be done".

Can you find out how the change in behavior was caused?
Radek 03/29/2006 07:02 am
I`ve been using Offline Explorer together with TextPipe to download and glue all messages from a group to make a single thread-aware big tree (HTML file) or even into format that can be imported into a NNTP news reader (e.g: Outllok Express/40tude Dialog).

I use the following URL (with STEP set to max value, i.e. 100):
http://groups.yahoo.com/group/GROUP_NAME/messages/{LAST_MESSAGE_NO..FIRST_MESSAGE_NO|STEP}?threaded=1&viscount=-STEP&expand=1

For example:
http://groups.yahoo.com/group/textpipe-discuss/messages/{496..1|30}?threaded=1&viscount=-30&expand=1

Then you only need to specify a few filters in external datamining tool (TextPipe) and you can get what you want, with even better interface than Yahoo! provides.

--
Radek
Oleg Chernavin 03/29/2006 07:02 am
One idea is to verify that the MS IE Cookies setting is checked in the Options dialog of Offline Explorer. You can also create a new Project with the POST request automatically if you hold Ctrl+Alt keys pressed while submitting an HTML Form (like logon form) in the Internal Browser of Offline Explorer Pro.

Oleg.