Single Website, double link confusion

Author Message
Jim Smith 10/17/2012 10:50 am
Is this the same as this? http://www.d20pfsrd.com/home and this https://sites.google.com/site/pathfinderogc/ i mean as OE will perceive them. They seem to be the same website but is there a difference? The d20 or the pathfinderogc? Will it make a difference in OE?
What to do on website cases with similarities?
Oleg Chernavin 10/17/2012 01:33 pm
These are two different sites for Offline Explorer, because it is distinguishing URLs, not contents.

I am thinking to add some feature (maybe optional) to handle such sites. An obvious example is www.site.com and site.com with exactly the same contents.

Best regards,
Oleg Chernavin
MP Staff
Jim Smith 10/17/2012 04:34 pm
I have tried the https@sites.google.com version and its currently 3,67 GB with no sign of coming to any end. The map has many sites, though only the main site takes up so much space and everything is being downloaded from it.

Any way to tell if i should have chosen this one? http://www.d20pfsrd.com
Oleg Chernavin 10/17/2012 04:45 pm
You should use URL Filters - Directory section to allow downloading from the starting directory only. Please also go through the File Filters section and select "Load using URL Filters" in their Location boxes.

Oleg.
Jim Smith 10/17/2012 04:50 pm
It already ticked Properties -> URL filters -> Directory -> Load files only within the starting directory and below

Maybe the site is truly so huge? I checked the files and most file types (64,8 of the total) taking up space have no file extension whatsoever, 21,300 in number.
Jim Smith 10/17/2012 05:06 pm
I used your suggestion, "Load using URL Filters" in each file type and i will restart it now.
Oleg Chernavin 10/19/2012 08:24 am
I see a number of strange links, like:

https://sites.google.com/site/pathfinderogc/classes/3rd-party-prestige-classes/alluria-publishing/pharaoh?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F

Maybe exclude them using URL Filters - Filename - Excluded list:

?tmpl=

Oleg.
Jim Smith 10/20/2012 11:20 am
These two websites seem to be one or dependent on each other. Some things are only on the second one, while some are on the first one.
If you download only one of them you miss a ton of things. Problem is that its too big. Just downloading the first one was 5 gb and i have no idea how much the second one will be.
I don't get it. 50,000+ files and still missing things as they are in the second website. Every other site i have downloaded is from 10 mb to 200 mb at most.....
Jim Smith 10/20/2012 11:49 am
I am thinking of these settings;

Project;
http://www.d20pfsrd.com/
https://sites.google.com/site/pathfinderogc/

File Filters;
All using Load using file filter settings

URL Filters;

Load files only with the starting Domain

Enter multiple server keywords

Excluded ?tmpl=


But i am having trouble as to what else to exclude. Most pages are text with the rare image. It should not be so large.
Oleg Chernavin 10/21/2012 08:57 am
Can you watch the Queue tab while downloading? Perhaps, there are other kinds of useless URLs that can be excluded to minimize the download.

Oleg.
Jim Smith 10/21/2012 09:16 am
I can but i have difficulty discerning which is useless and which is not. How can i exclude files with not file extension? Can i? Will the site function without them?
Oleg Chernavin 10/21/2012 10:44 am
Just post a few examples that you may seem suspicious here. I will look at them and advise you.

Oleg.