problem: downloading myspace forum

Author Message
vlwow 08/20/2006 12:49 am
i'm using a trial version of OEE 4.3.2442, and my OS is windows xp. i wanted to improve my english through reading posts by natives, so i went to the forum on myspace often to read posts there, but probablly due to my internet connections, the browsing speed was rather slow, so i used OEE to download some of the posts and to browse offline. i downloaded from this page (http://forum.myspace.com/index.cfm?fuseaction=messageboard.viewCategory&CategoryID=57&Mytoken=A35162C3-2925-4302-951C3C88CB3D7B2761493156) and set the level value to 3(other parameters remained unchanged). OEE downloaded more than 1700 files in all, but when i browsed offline, i could open only the first page of each thread, and when i clicked "next page", nothing happened. it was the same when i wanted to go to the second page of threads.

another question, yesterday i tried to download the whole movie sub-forum, there were over 70000 files to be downloaded, in the beginning it ran smoothly(though OEE cpu usage is a little too high, about from 20% to 70%, i got a Celeron D 2.66G cpu by the way), but when the downloaded files reached over 10000, my free memory began to drop dramatically till there were only a few thousand KB left (my memory size is 512M). in the end when OEE had pulled 14000 or so files, there appeared a windows error message, saying my virtual memory's lower limit was set too low(i didn't know what that meant, i got a virtual momory size of 768M), and the project stopped.

looking forward to your reply, thanks.
Oleg Chernavin 08/21/2006 12:00 pm
The scripts are very complex on that site, so it is a bit hard to get these links working. I would suggest you to add the following line to the URLs field of your Project and download it:

http://forum.myspace.com/index.cfm?fuseaction=messageboard.viewCategory&categoryID=57&groupID=0&Mytoken=F6C9CDBC-F852-4BDD-B09310BB7D3DB59E283208031&get=1&page={:1..47}&lastpagesent=0&keyword=film&xargstringp=&xargstringn=&EntryID=&stickydo=stickydo&NoDoSticky=no

Regarding the big forum - this probably happens because too many files now reside in a single folder on your hard disk. Please remove the downloaded files and allow directory overload protection in the Options dialog - File Locations section.

Best regards,
Oleg Chernavin
MP Staff
vlwow 08/22/2006 08:01 am
thanks for replying, i did what you said, but with the first problem it didn't work. i tried another forum using OEE, everything went well, so it seemes that the myspace forum is indeed a bit hard to tackle, just as you said. no matter, after some experiment, i think i have figured out a little how OEE works, thanks again for your help^_^
Oleg Chernavin 08/23/2006 12:56 pm
Can you please allow forms processing in the Project Properties dialog | Advanced section and download that Project again - I think, the 1 2 3... 47 links will start working offline.

Oleg.
vlwow 08/25/2006 07:16 am
you mean "explore html forms"? i allowed it, it didn't work:( , i checked the downloaded files(about 5000 of them,BTW i set the level limit to 3), found that most of them are duplicate files, i deleted these and got only about 500 files left which are all first pages of threads. can't figure out what went wrong.
vlwow 08/25/2006 07:31 am
> you mean "explore html forms"? i allowed it, it didn't work:( , i checked the downloaded files(about 5000 of them,BTW i set the level limit to 3), found that most of them are duplicate files, i deleted these and got only about 500 files left which are all first pages of threads. can't figure out what went wrong.


by "first pages of threads", i meant the first page of posts of a thread.
Oleg Chernavin 08/26/2006 02:46 am
Well, I cannot check this right now. I will be back to my office in a week and I will be able to work on this. Can you please wait?

Thank you!

Oleg.
vlwow 08/26/2006 06:34 am
i think i have figured out why i got duplicate files, i noticed that if i opened exactly the same page at different times, i got different url cuz the "my token=xxxxx" part in the url varied each time, i guessed this should be the reason, so i added a url substitute rule replacing "my token=xxxxx" with "", seems it worked^_^.

i also found using the url you gave me, the 1 2 3... 47 pages actually have been downloaded, but they just couldn't be browsed offline by clicking 1 2 3.....47 on the page. besides, no matter what level limit i set it to be, OEE can only get the first page of posts for each thread. so, it seems these links controlled by javascripts are really a little bit hard to handle on this site. nontheless, i can just open those 47 pages one by one and read threads on them even though for those longer threads not all of the posts are available, so, no big deal.
Oleg Chernavin 08/26/2006 10:23 am
Yes, these sub-pages are hard to download, this is why I gave you the URL with the {:...} macro. The script on these pages changes a form and submits this. I was unable to do an automatic download of such links yet. It is not easy.

Oleg.