Downloading Wikipedia image pages?

Author Message
Jake 12/14/2006 09:43 am
Hi ya all!

I'm trying to download all of the image pages from wikipedia but I'm struggling with this problem. I just want these "Image~" pages (every page has "Image~" tag on them) with thumbs on them and NOT those HUGE original pictures which are linked to the page. Look at here:

I want this kind of pages (http://static.wikipedia.org/wikipedia/en/t/h/e/Image~The_Earth_seen_from_Apollo_17.jpg_1856.html), but these also got downloaded (http://static.wikipedia.org/wikipedia/en/upload/shared/9/97/The_Earth_seen_from_Apollo_17.jpg).

Making this little more complicated :)

There is also this kind of pages which doesn't have a bigger images of them (http://static.wikipedia.org/wikipedia/en/p/l/u/Image~Pluto.jpg_5480.html) BUT I also want these.

Now I'm out of my head with all the possible filter combinations...
Oleg Chernavin 12/14/2006 11:07 am
What about setting a size limit in the File Filters - Images? I think, that 100 KBs will be enough.

Best regards,
Oleg Chernavin
MP Staff
Jake 12/14/2006 11:39 am
> What about setting a size limit in the File Filters - Images? I think, that 100 KBs will be enough.
>
> Best regards,
> Oleg Chernavin
> MP Staff

That will work with some images, but there are plenty of pics that are like 100KB (thumb) and 150KB (original). There's also pics which only have thumb but are bigger than many of those with two sizes (>150kb).
Oleg Chernavin 12/14/2006 11:51 am
OK. What about dimensions (like 200x200 max)?

Oleg.
Jake 12/14/2006 12:02 pm
> OK. What about dimensions (like 200x200 max)?
>
> Oleg.

No. That won't work either.
Oleg Chernavin 12/14/2006 12:27 pm
OK. What about using URL Filters - Filename included keywords:

*.html
http://*/*thumb*/*

Oleg.
Jake 12/14/2006 12:42 pm
> OK. What about using URL Filters - Filename included keywords:
>
> *.html
> http://*/*thumb*/*
>
> Oleg.

Then it won't load those images that only have thumb pics: http://static.wikipedia.org/wikipedia/en/p/l/u/Image~Pluto.jpg_5480.html

Isn't there anyway to make a custom filter like this (http://static.wikipedia.org/wikipedia/en/*/*/*/Image~*.html)?
Oleg Chernavin 12/14/2006 12:50 pm
> Isn't there anyway to make a custom filter like this (http://static.wikipedia.org/wikipedia/en/*/*/*/Image~*.html)?

You can, but this will only load HTML pages, not images. Sorry, I don't have any other good idea.

Oleg.
Jake 12/14/2006 12:53 pm
> > Isn't there anyway to make a custom filter like this (http://static.wikipedia.org/wikipedia/en/*/*/*/Image~*.html)?
>
> You can, but this will only load HTML pages, not images. Sorry, I don't have any other good idea.
>
> Oleg.

Thanks anyways.

I try to think something....if nothing else works, then I just download them all :)
Oleg Chernavin 12/14/2006 12:55 pm
Yes, you will have to....

Oleg.
Dmitry 01/22/2007 08:04 am
Just sharing some knowledge
Regarding this specific resource, wikipedia makes possible downloading whole content from
single archive or several archives. So you can download all current wikipedia without browsing it.
And its free.


For images this is:

http://download.wikimedia.org/images/