Quick question

Author Message
Peter Harkness 02/04/2011 04:12 am
If I untick "text" in the project properties I know that text files (html etc.) are still retrieved to be parsed but are not saved. However if text is unticked it looks like the URL filters are not applied.

I want to download from a single server, and not follow links to other servers BUT I want to leave "text" unticked, i.e. NOT save text files. However I find that in this case the project accesses ever server it can see by following links.

Is there a way to restrict a project to ONE server whilst "text" is unticked in the properties?

Thanks

Oleg Chernavin 02/04/2011 05:51 am
Please check the File Filters - Text category and set its Location field whether to "Load using URL Filters" or "Load from the starting server". Then uncheck the Text category again.

Best regards,
Oleg Chernavin
MP Staff
Peter Harkness 02/04/2011 09:48 am
That's what I did. I came in this morning to find google, bbc and about 30 other sites under the project folder!

Oleg Chernavin 02/04/2011 09:53 am
Perhaps, they are some scripts or styles handled by the File Filters - User Defined category?

Oleg.
Peter Harkness 02/04/2011 10:34 am
All categories are set to use URL filters. The "server" URL filter is set to the starting domain only.
Oleg Chernavin 02/06/2011 02:42 pm
Can you post the settings of the Project here (Ctrl+C on it and paste to the message)? I will try to download myself and see what happens.

Oleg.
Peter Harkness 02/09/2011 01:23 pm
Oleg,

The link is internal but I have found the following:

For a project url of http://www.mycompany.co.uk

In the "Test URL against URL Filters" Option something weird is happening. The URL filters is set to be restricted to the starting domain. If I type "google" in the test box it says "The URL is rejected reason: URL filters | server"

However if I type in "http://www.google.co.uk" It says "The URL will be downloaded". The same happens as follows:

"bbc" - not downloaded
"http://www.bbc.co.uk" - downloaded

And I tried this:

"rubbish" - not downloaded
"rubbish.co.uk" - downloaded.

OE pro seems to think that the starting domain is ".co.uk" NOT "mycompany.co.uk". If I make it "rubbish.org.uk" the test says that it won't get downloaded. It looks like a problem where the starting url has a domain component with 3 parts or more.

Project pasted below.

Peter





[Object]
OEVersion=Pro 5.9.0.3318
Type=0
IID=7044
Caption=Intranet
URL=http://www.mycompany.co.uk/users/main.php
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=2
SkipMedia=True
FTText.Exts=aspaspxcfmhtmhtmlhtxidcjspphpphp3rxmlshtmlstmstmltexttxtwmlxmlxsp xxxxxxxxxxxxxxxxxxx
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=xoxooo
FTImages.B=xoxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
NotIgnoreLogout=False
RSrvsBx=3
RProt=255
LastStart=128:103:25:221:84:208:227:64:
LastEnd=211:1:113:222:84:208:227:64:
LastStarted=04/02/2011 15:38:51
LastEnded=04/02/2011 15:39:06
S200=8
S304=5
SAbr=34
SPar=10
SLast=200
SHTML=8
SSuccDowns=3
LFiles=13
LSize=79455
Stopped=True
Flags=1
CFFlags=80
ApplyAllSubsts=True
ImgDim=0,0,0,0
ParseComplexScripts=True
ConvertRSS=True
IPAddr=-1662355282
LIndexed=False
IndexFiles=False



Oleg Chernavin 02/10/2011 08:05 am
Thank you for the settings! I found and fixed the error. Here is the updated oe.exe file:

http://www.metaproducts.com/download/betas/OEP3321.ZIP

Oleg.
Peter Harkness 02/11/2011 04:10 am
That worked great, thanks!
Oleg Chernavin 02/11/2011 05:48 am
You are welcome!

Oleg.
Peter Harkness 02/15/2011 06:26 am
Oleg,

There is still a problem. The changes you made in the 5.9.3321 Service Release 4 file you gave me has broken other server URL filters. Try the project below which is for a start URL of :

http://archive.intranet.com/archive/

In version 5.9.3321 SR4 typing www.google.com into the test box gives a result that the URL WILL be downloaded, which is wrong as this project is restricted to servers in the starting domain.

However If I re-install the old 5.9.3318 Service Release 3 binary the URL filter says "The URL is rejected. Reason: URL Filters | Server".

So version 5.9.3318 works for xxxx.com domain filters but not xxx.co.uk domains.

version 5.9.3321 works for xxx.co.uk domains but not for xxx.com domains!

Peter


[Object]
OEVersion=Pro 5.9.0.3318
Type=0
IID=7046
Caption=Test 2
URL=http://archive.intranet.com/archive/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
SkipMedia=True
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=aniasfasxaviflcfliflvm1vm2vmovmp4mpegmpgramrmrvsmilvivvobwmv oooooooooxoooooooooo
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
NotIgnoreLogout=False
RSrvsBx=3
RProt=255
LastStart=58:198:237:208:174:209:227:64:
LastEnd=137:125:199:230:174:209:227:64:
LastStarted=15/02/2011 11:06:43
LastEnded=15/02/2011 11:10:34
S200=218
SAbr=16993
SPar=156
SSav=218
SLast=200
SSiz=16536776
SMdf=218
SHTML=155
LFiles=218
LSize=16536776
Stopped=True
Flags=1
ImgDim=0,0,0,0
ConvertRSS=True
IPAddr=-1767896568
LIndexed=False
IndexFiles=False
Oleg Chernavin 02/18/2011 04:59 pm
I fixed that. Sorry for a silly bug!

http://www.metaproducts.com/download/betas/OEP3323.ZIP

Oleg.