What to do on website cases with similarities?
I am thinking to add some feature (maybe optional) to handle such sites. An obvious example is www.site.com and site.com with exactly the same contents.
Any way to tell if i should have chosen this one? http://www.d20pfsrd.com
Maybe the site is truly so huge? I checked the files and most file types (64,8 of the total) taking up space have no file extension whatsoever, 21,300 in number.
Maybe exclude them using URL Filters - Filename - Excluded list:
If you download only one of them you miss a ton of things. Problem is that its too big. Just downloading the first one was 5 gb and i have no idea how much the second one will be.
I don't get it. 50,000+ files and still missing things as they are in the second website. Every other site i have downloaded is from 10 mb to 200 mb at most.....
All using Load using file filter settings
Load files only with the starting Domain
Enter multiple server keywords
But i am having trouble as to what else to exclude. Most pages are text with the rare image. It should not be so large.