Need Help with Downloading a Constantly updated Massive Thread

Author Message
Eric 04/13/2009 01:25 pm
been trying for a while without much success to mirror a massive thread I''m subscribed to (useing Offline Explorer 5.4 Demo)
the fear of one day site may go down, i wanted to make a copy for off line browsing
and have it run from time to time to up date the thread new comments and

the site is Powered by vBulletinsanitized_by_modx& #174 Version 3.8.1, vBSEO 3.2.0
and the thread keep growing with its title name in this format

www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx.html
www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-2.html
www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-22.html
www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-126.html

the thread is full of contents and some linking to other sites for comments mainly
this is where it gets really difficult for me, i want it to stay only on the thread
and get all the comments as the thread keeps expanding, with all its extra contents (attachments)
some content are small thumb pictures leading to a larger one, most the time on the same domain
or on an external host

some content may only be downloaded with members login
I''ve read all the support threads, and was able to somehow grab files with a login
but its get out of control and many things don''t get downloaded, I''ve heard and read nothing but great things about this product
and been playing with the demo version, and feel very comfortable with it
compared to other programs I''ve been testing for this purpose
any help in getting this to work much appreciated

Thanks in Advance
Eric
Eric 04/13/2009 01:30 pm
i was able to do this partially by building the whole url list and have it parsed in that way
the problem was it keeps updating, unless i update the list to parse manually
the software will not follow and build the next page
Oleg Chernavin 04/14/2009 05:12 am
I think, you need to create a Project with the URL:

www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-{:1..126}.html

The {:...} is a URL Macro that will load all 126 URLs. Adjust the numbers to the range you need. Set Level to 1 to load all linked files. Also, logon this forum in the Internal browser of Offline Explorer before downloading the Project. Then start the download.

You may set the upper limit to something high instead of the real 126. Some extra pages will be loaded but they will be either empty or sent with an error by the server.

Best regards,
Oleg Chernavin
MP Staff
Eric 04/14/2009 02:18 pm
> I think, you need to create a Project with the URL:
>
> www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-{:1..126}.html
>
> The {:...} is a URL Macro that will load all 126 URLs. Adjust the numbers to the range you need. Set Level to 1 to load all linked files. Also, logon this forum in the Internal browser of Offline Explorer before downloading the Project. Then start the download.
>
> You may set the upper limit to something high instead of the real 126. Some extra pages will be loaded but they will be either empty or sent with an error by the server.
>
> Best regards,
> Oleg Chernavin
> MP Staff

Thank you Oleg, that''s very helpful and exactly what i was looking for
i was trying to figure a way to set this so future pages are included automatically
ill try adding extra blank pages in the {:1..126} sequence,
ill report my results back as soon as i get a chance
Thanks again

Best regards

Eric
Oleg Chernavin 04/14/2009 02:29 pm
You are welcome!

Oleg.
Eric 04/15/2009 10:17 am
> I think, you need to create a Project with the URL:
>
> www.xxx.org/xxx-xxx/123456-xxx-xxx-xxx-xxx-xxx-{:1..126}.html
>
> Best regards,
> Oleg Chernavin
> MP Staff

Good morning Oleg
followed your advice with partial success
all pages in range were download with your recommendations,
but many of the site tags and connecting pages were not,
eg: show printable version, and individual posts were not captured

i ended up getting as close to the results i was looking for
in somewhat of a strange way, here is what is did
i ended up not using macro {:1..126} and still capturing every page in range
i believe future pages will load to

since the recent Vbulletin board updates, the url are no longer
showing the thread number, rather the thread name
added that into URL filters > filename "thread-name-added-like-this" worked great
i then had issue with site images, icons and avatar not downloading
added in the same place *.gif and *.jpg
now most site icons and images are downloaded
(i thought this was suppose to be set at the file filter level)
accept for avatar that''s not in the sites default pack
any avatar that''s custom made and uploaded, I''m unable to capture at this point
everything else is working perfectly in this way now, without restricting or adding page numbers
the config goes as follows, (need to figure out how to grab the missing custom avatars)

Project: No level limit, everything else default
File Filters: all Default
URL Filters: as mentioned above
Omissions, added few address to avoid advertisement
Protocol: custom > http
Server: checked > load files from the flowing servers > checked - Domain (rest default)
Directory: nothing is checked
Filename: as mentioned above

content filter: all is default

Advanced: all default accept for
Parsing:
check file integrity > checked
load server side images > checked
suppress website errors > checked

Thank you in advance

Best Regards
Eric
Oleg Chernavin 04/15/2009 01:29 pm
Well, you could use File Filters - Images and select "Load from any site" in the Location box. If some other links cannot be loaded, let me know real examples of links and I will tell what to change.

Oleg.
Eric 04/15/2009 01:38 pm
> Well, you could use File Filters - Images and select "Load from any site" in the Location box. If some other links cannot be loaded, let me know real examples of links and I will tell what to change.
>
> Oleg.

Thank you Oleg, ill give it a try
i who''d love to send you actual site examples,
is there a private way we can conduct this?
Thank you in Advance

Best regards

Eric
Eric 04/15/2009 01:40 pm
> Well, you could use File Filters - Images and select "Load from any site" in the Location box.
>
> Oleg.

checked, it is set to Load from any site!

Best Regards

Eric
Oleg Chernavin 04/15/2009 02:09 pm
Please send it to support@metaproducts.com (it will be resent to me by support later) or I can write you myself using the address you left in the forum.

Oleg.
Eric 04/15/2009 02:24 pm
> Please send it to support@metaproducts.com (it will be resent to me by support later) or I can write you myself using the address you left in the forum.
>
> Oleg.

Thank you Oleg,
please use the address posted on the forum
Thank you in Advance

Best Regards

Eric