How to download all pages of messages in a Yahoo Group?

Author Message
Ryan 08/27/2015 01:05 am
Can anyone tell me the specific settings needed in OE to download all pages of messages in a Yahoo Group for a specific group?

Here is an example of a page for a single message in a group. Note that you must be signed in to Yahoo Groups for this page to load.

https://groups.yahoo.com/neo/groups/lowdosenaltrexone/conversations/messages/126482

What I want to be able to do is to save every message in the group.

For example. I would save the pages:

https://groups.yahoo.com/neo/groups/lowdosenaltrexone/conversations/messages/126482

and

https://groups.yahoo.com/neo/groups/lowdosenaltrexone/conversations/messages/126481

and

https://groups.yahoo.com/neo/groups/lowdosenaltrexone/conversations/messages/126480

etc.

(Note that the numbers are different at the end of each url indicating a different message in the group.)

It should be a simple job but I don't know of any way to do it that actually works. Simply making the starting url

https://groups.yahoo.com/neo/groups/lowdosenaltrexone/conversations/messages/

doesn't seem to work. Maybe because access to that url is restricted?

What specifically would need to be done to accomplish this task? Please provide a specific summary of all the critical steps/settings if possible and/or reveal the key to making this work. Thank you for any help!
Oleg Chernavin 08/27/2015 07:02 am
It doesn't allow me to browse this group. The site tells:

==================
Oops!

You need to be a member to perform this action.
==================

If your membership allows that, please open this address in the Internal browser, logon into Yahoo groups online, make sure you see the messages and then start downloading them. I tried this with another group:

https://groups.yahoo.com/neo/groups/Freecycle_Brampton/conversations/messages

and it worked - all messages were downloaded.

Best regards,
Oleg Chernavin
MP Staff
ryan 08/27/2015 10:27 am
Thank you for your reply!

I understand what you said, although I still have problems.

I have been able to download a few of the messages, but not the older messages.

Here is a question:

Say you are signed into YG in the OE browser and you want to download the entire Yahoo Group including conversations, files, photos, etc.

If you set the starting url as follows:

https://groups.yahoo.com/neo/groups/mygroup/

You can see that page in the browser, but when you start the project it generates the error "Address (URL) not found"

Why doesn't this work?
ryan 08/27/2015 11:49 am
I am just not having any luck no matter what I try. I can get a few of the messages, but I also get a lot of stuff I don't want. So after a very long download I get a few messages I want, and many files I don't want.

One thing I cannot figure out is how to make some of the URL filters work.

Here is an example:

1.Set starting URL to https://groups.yahoo.com/neo/groups/mygroup/conversations/messages/

2. Tick URL Filters/Directory
Load only files within the starting directory and below.

3. Start a fresh download.

In addition to downloading files in groups.yahoo.com/neo/groups/mygroup/conversations/messages/

OE also downloads the files in other directories not even in the groups.yahoo.com root such as:

//https@csc.beap.bc.yahoo.com
//https@b.scorecardresearch.com

Why are these other files downloaded if they are not in the starting directory?

I apologize if my questions don't make sense. I am struggling to understand how this works... especially the URL filters section.

Ideally I would like to set the filters so that the ONLY files that download are the files in the /messages/ directory. But no matter how I set filters, I keep getting files in many other directories. I don't know how to control that.

Thanks again for any help!
Oleg Chernavin 08/27/2015 02:13 pm
Yes, there is no valid group address like that. For example, if you would open the following address online, you would get a 404 error:

https://groups.yahoo.com/neo/groups/Freecycle_Brampton/

The correct address is:

https://groups.yahoo.com/neo/groups/Freecycle_Brampton/conversations/messages

Setting filters should be as following - URL Filters - Directory - check the Load only from the starting directory box. This should be enough.

Some images, styles and scripts should be also loaded to display the downloaded pages propertly. They are hosted on other directories and even servers, this is why other root folders appear.

If you don't want these folders and it is OK to view web pages offline without styles, use File Filters - User Defined and Images categories and set the Location field to "Load using URL Filters settings".

Oleg.
Ryan 08/27/2015 03:34 pm
Yes! Thank you Oleg!

I think this is the part I was missing.

I suspected that maybe there were things being loaded even though they didn't pass the filter rules because maybe they were needed as support files. But I didn't fully understand that until you just explained it.

I think that's the answer I am looking for! And I think the method you gave will work!

I was always confused about why files and directories were downloading even though the filters were set to prevent them. Now I finally understand!

Excellent! I think I understand it well enough now.

Thank you so much for the great help!!! :)
Oleg Chernavin 08/27/2015 03:43 pm
OK. Great that my advice is helpful! You are welcome!

Oleg.