Saving downloaded links

Author Message
Peter Hawkins 05/27/2009 12:01 pm
Is there a way to indentify from the logs when a downloaded files is actually saved to disk. I understand that files can be downloaded for parsing, or can be downloaded and saved. I see a file downloded in the log but it doesn''t get saved to disk, while other files do.

What determins when a file is downloaded, parsed and discarded or downloaded, parsed and saved?

I''ve been using OE pro for a while and am happy but the concept of "downloaded" has always confused me. For example if I test a URL against URL Filters and the test says "This URL will be downloaded", does this mean downloaded and saved to disk or just downloaded for parsing.


Also how does the new "Ignore Logout Links" work. What sort of links does it look for.

Thanks
Oleg Chernavin 05/28/2009 06:47 am
Actually all downloaded files will be saved. There are only three exceptions:

1. Disk is overloaded.
2. You uncheck File Filters - Text category. This means that Web pages will be loaded to get links to follow, but they are not saved. This is useful when you want to download only images.
3. When Contents Filters directly do not allow to save web pages that have or do not have the keywords you enter there.

If none of the above apply, can you give me some hints how I can reproduce this myself? Thank you!

Best regards,
Oleg Chernavin
MP Staff
Peter Hawkins 05/29/2009 09:31 am
> Actually all downloaded files will be saved. There are only three exceptions:
>
> 1. Disk is overloaded.
> 2. You uncheck File Filters - Text category. This means that Web pages will be loaded to get links to follow, but they are not saved. This is useful when you want to download only images.
> 3. When Contents Filters directly do not allow to save web pages that have or do not have the keywords you enter there.
>
> If none of the above apply, can you give me some hints how I can reproduce this myself? Thank you!
>
> Best regards,
> Oleg Chernavin
> MP Staff

Oleg,

This is an internal intranet site so you won''t be able to access it. Enclosed is a log extract of the file being downloaded. This is a BIG site with lots of small files so I tried with OE Enterprise also. I have a URL substitute which does:

http://intranet.internal/sequence.php?lot=* ---> http://intranet.internal/php/*/sequence.php.htm


The sequence.php.htm file never gets saved to local disk.

Peter

Offline Explorer Enterprise 5.5.2994
HTTP0 29/05/2009 14:17:52 GET /sequence.php?lot=005475 HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Referer: http://intranet.internal/updates.htm
Host: intranet.internal
Cookie: Size=1536; psso%5fYTZkY2RlYTRiMGE1MTRhMzM5YWYxZWNkYzBmM2I5Nzg%3d=V1F0UXMvTUlhRFF5V0dlcFhVN3JqUko3NWptbTNvMm53VmFNblY3eUd0QT0K
HTTP0 29/05/2009 14:17:53 HTTP/1.1 200 OK
Date: Fri, 29 May 2009 13:17:55 GMT
Server: Apache/2.2.0 (Fedora)
X-Powered-By: PHP/5.2.1
Content-Length: 13164
Content-Type: text/html
HTTP0 29/05/2009 14:17:53 Reading data
HTTP0 29/05/2009 14:17:53 7% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 15% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 23% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 31% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 38% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 46% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 54% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 62% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 70% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 77% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 85% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 93% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
HTTP0 29/05/2009 14:17:53 100% of 13164 bytes of http://intranet.internal/sequence.php?lot=005475.
QUEUE 29/05/2009 14:17:53 Parsing (0). http://intranet.internal/sequence.php?lot=005475
PARSER 29/05/2009 14:17:53 Rejected URL (Disabled in a File Filters section): http://intranet.internal/intranet.css
QUEUE 29/05/2009 14:17:53 Parsing end.
QUEUE 29/05/2009 14:17:53 Parsing files added.
Oleg Chernavin 05/29/2009 09:59 am
Can you post Project settings here? I will emulate this on our test server here. Please select it, use Export - Project Settings - Copy and paste to the forum message.

Oleg.
Peter Hawkins 05/29/2009 10:43 am
> Can you post Project settings here? I will emulate this on our test server here. Please select it, use Export - Project Settings - Copy and paste to the forum message.
>
> Oleg.


Here you go.

Stream 1.2 File
[Object]
OEVersion=Enterprise 5.5.0.2994
Type=0
IID=7049
Caption=Intranet Extraction
URL=http://intranet.internal/home.htmAdditional=NoMovedDuplicates
MVer=5
Lev=1
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmil
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RSrvsBx=1
RPathEx=forums x
RProt=255
LastStart=0:0:0:0:0:0:0:0:
LastEnd=0:0:0:0:0:0:0:0:
CFKeywords="customer support"
CFFlags=80
SubstsB=aHR0cDovL2ludHJhbmV0LmludGVybmFsL3NlcXVlbmNlLnBocD9sb3Q9KglodHRwOi8vaW50cmFuZXQuaW50ZXJuYWwvc2VxdWVuY2UucGhwP2xvdD0qCWh0dHA6Ly9pbnRyYW5ldC5pbnRlcm5hbC9waHAvKi9zZXF1ZW5jZS5waHAuaHRtCVgNCg==
ApplyAllSubsts=True
ImgDim=0,0,0,0
ParseComplexScripts=True
LIndexed=False
IndexFiles=False
Oleg Chernavin 06/01/2009 09:35 am
This is case (3) - Contents Filters. You have "customer support" there and you do not allow to save pages that contain the keywords and do not contain them.

Oleg.
Peter Hawkins 06/01/2009 10:06 am
> This is case (3) - Contents Filters. You have "customer support" there and you do not allow to save pages that contain the keywords and do not contain them.
>
> Oleg.

Oleg,

I''ve checked the html produced by the page sequence.php. It does not contain the words "customer support".

Peter
Oleg Chernavin 06/01/2009 10:58 am
Yes, but in the Properties - Contents Filters section you setup Offline Explorer to stop saving all files - no matter if they contain keywords or not. Please review this setting.

Oleg.