I'm trying to download all of the image pages from wikipedia but I'm struggling with this problem. I just want these "Image~" pages (every page has "Image~" tag on them) with thumbs on them and NOT those HUGE original pictures which are linked to the page. Look at here:
I want this kind of pages (http://static.wikipedia.org/wikipedia/en/t/h/e/Image~The_Earth_seen_from_Apollo_17.jpg_1856.html), but these also got downloaded (http://static.wikipedia.org/wikipedia/en/upload/shared/9/97/The_Earth_seen_from_Apollo_17.jpg).
Making this little more complicated :)
There is also this kind of pages which doesn't have a bigger images of them (http://static.wikipedia.org/wikipedia/en/p/l/u/Image~Pluto.jpg_5480.html) BUT I also want these.
Now I'm out of my head with all the possible filter combinations...
> Best regards,
> Oleg Chernavin
> MP Staff
That will work with some images, but there are plenty of pics that are like 100KB (thumb) and 150KB (original). There's also pics which only have thumb but are bigger than many of those with two sizes (>150kb).
No. That won't work either.
Then it won't load those images that only have thumb pics: http://static.wikipedia.org/wikipedia/en/p/l/u/Image~Pluto.jpg_5480.html
Isn't there anyway to make a custom filter like this (http://static.wikipedia.org/wikipedia/en/*/*/*/Image~*.html)?
You can, but this will only load HTML pages, not images. Sorry, I don't have any other good idea.
> You can, but this will only load HTML pages, not images. Sorry, I don't have any other good idea.
I try to think something....if nothing else works, then I just download them all :)
Regarding this specific resource, wikipedia makes possible downloading whole content from
single archive or several archives. So you can download all current wikipedia without browsing it.
And its free.
For images this is: