how to download this site?

Author Message
newbiii 02/05/2008 07:36 am
Hi,

The site is www.onemanga.com where I'd like to do is download ALL pages (text and image) of a SINGLE 'title' I choose. Site is divided into two servers (namewise) where text remains in www.onemanga.com/[title subdirectory] and images remain linked in img[#].onemanga.com/[title numbered subdirectory].

For example for "suzuka";
text -> http://www.onemanga.com/Suzuka/[page subdirs]
images -> http://img30.onemanga.com/mangas/00000035/[page subdirs with different naming]

Including 'www.onemanga.com' and 'img30.onemanga.com' for servers with including 'Suzuka' and '00000035' for directories all at the same time won't download ALL the linked images. One can think wider to download more 'titles' at the same time as a simpler approach but the site is "huge" and this is unnecessary.

I guess there must be something/some setting I'm missing yet I can't figure out what. Can anyone please pinpoint/help me about it?
Oleg Chernavin 02/05/2008 07:58 am
I think, you can limit the download to the starting server/directory in URL Filters and use File Filters - Images to allow them to be loaded from any site in the Location box.

Best regards,
Oleg Chernavin
MP Staff
newbiii 02/05/2008 09:54 am
> I think, you can limit the download to the starting server/directory in URL Filters and use File Filters - Images to allow them to be loaded from any site in the Location box.
>
> Best regards,
> Oleg Chernavin
> MP Staff

Thanks for the reply but first thing you recommended is already there. The settings are:

url: http://www.onemanga.com/Suzuka/
level limit: none
file filters: all (text/image/etc...) to default (doesn't matter just images)
url filters: none / all protocols
servers: starting domain OR custom with www.onemanga.com and img30.onemanga.com 'included' (same result for OR)
directory: start and below OR custom with "Suzuka" and "00000035" (numeric equivalent on image server) keywords (same result for OR)
filename: all
content filters: none
all other options to defaults.

As can be seen, it should have been easy yet the results are irrelevant since it skips some "images". With Firefox etc. everything is fine but Offline Explorer, due to wrong options or so, really misses some of the images. I think the problem is the source urls (all end with a "/" where no actual html file) DON'T create any files at all (just parsed) on the disk. So image extraction can somehow misjudges?...

You can try it for yourself for the "least number" of pages for:
url: http://www.onemanga.com/Tokyo_Ants/ (4 chapters)
with image counterpart http://img.onemanga.com/mangas/00000364/[pages...]

p.s.: there's also annoying google.js where I can't get rid of because of the fear to mess a already non-working project settings.
Oleg Chernavin 02/06/2008 06:05 am
I created this Project and it loaded all images correctly. You can test why a certain URL cannot be loaded using the Test button in Properties - URL Filters section.

Oleg.
newbiii 02/07/2008 05:29 am
> I created this Project and it loaded all images correctly. You can test why a certain URL cannot be loaded using the Test button in Properties - URL Filters section.
>
> Oleg.

That "TEST BUTTON" is simply GREAT!!! I pinpoint my own problem, so thank you very much. I never realized such a great feature is simply sitting there for users to help. Thanks again... ;-)
Oleg Chernavin 02/07/2008 05:45 am
You are welcome!

Oleg.