custom link parsing

Author Message
Kevork Kevorkian 06/29/2006 07:05 am
The server returns multiple pages " searchAMResult.aspx?...&...&... " containing links in the following format:

content.aspx@aID=79446&searchStr=abdominojugular+reflex+test#79446

They are identical to the already downloaded:

content.aspx@aID=79446

Is there a way to tell OE to remove the ...|&searchStr=abdominojugular+reflex+test|#... portion of the link from all the files that have the "searchAMResult" prefix in their title when exporting the project? Or is there another software that can help me post process the already downloaded searchAMResult.aspx documents?
Oleg Chernavin 07/04/2006 05:57 am
# symbol is not a part of the URL, so URL Substitutes do not touch it. It is preserved in HTML code as-is.

Oleg.
Oleg Chernavin 07/17/2006 05:59 am
I remember now. MS IE doesn't correctly work with # symbols when you view a page from the hard disk. It considers #-part to be a filename, while other browsers, like Opera and FireFox consider it correctly as a link inside a page. This is why OE keeps this part only when exporting to EXE files, because they contain an internal Web server.

Oleg.
Kevork Kevorkian 07/17/2006 07:16 am
> I remember now. MS IE doesn't correctly work with # symbols when you view a page from the hard disk. It considers #-part to be a filename, while other browsers, like Opera and FireFox consider it correctly as a link inside a page. This is why OE keeps this part only when exporting to EXE files, because they contain an internal Web server.
>
> Oleg.
Is there a solution of this issue? Will agent identification settings matter on this occasion?

As I can't post a new question, is it possible to synchronize the contents of the already downloaded pages with some recently altered project settings - e.g. if I expand the project by removing some URL filters limitations - can I make the links in the already downloaded files point to the newly downloaded files instead to the web (in the on-line link translation mode) without downloading them again?
Oleg Chernavin 07/17/2006 07:39 am
Sorry for the issue with the forum - we are working on it now.

You can do the following - add the line to the URLs field of the Project:

Additional=KeepPrimary

Download the Project. This will allow you to use Ctrl+F5 to update all links in the site, but the site will keep original HTML files on the disk.

Oleg.
Kevork Kevorkian 07/17/2006 08:10 am
> Sorry for the issue with the forum - we are working on it now.
>
> You can do the following - add the line to the URLs field of the Project:
>
> Additional=KeepPrimary
>
> Download the Project. This will allow you to use Ctrl+F5 to update all links in the site, but the site will keep original HTML files on the disk.
>
> Oleg.

As I download a site that requires the users to log in, and after the expiration of the log in I receive log in confirmation pages instead of the desired contents, I used to delete the folders with the expired login from the map tree. Does this delete the primary files also? I don't want to get the original log in prompt pages when updating the project.
Oleg Chernavin 07/17/2006 08:48 am
Removing files will also remove their .primary copies.

Oleg.
Kevork Kevorkian 07/18/2006 01:43 am
> > Sorry for the issue with the forum - we are working on it now.
> >
> > You can do the following - add the line to the URLs field of the Project:
> >
> > Additional=KeepPrimary
> >
> > Download the Project. This will allow you to use Ctrl+F5 to update all links in the site, but the site will keep original HTML files on the disk.
> >
> > Oleg.
>

Unfortunately
Additional=KeepPrimary
And using control + F5 after stoppinng the download did not change the link references - they again pointed to the original online location of the newly allowed in the project settings files. Is there a way to overcome this? I. e. to translate

http://www.accessmedicine.com/... into
http://127.0.0.1:800/Default/www.accessmedicine.com/... ?

My current project looks like this:

[Object]
OEVersion=Enterprise 4.3.0.2418
Type=0
IID=2
Caption=Access Medicine
URL=http://www.accessmedicine.com/index.aspxAdditional=KeepPrimary
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=2
LTMethod=1
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgz
FTUDef.Exts=jscssssivbsdtdxslswfclass
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
RSrvsBx=2
RSrvsIn=www.accessmedicine.comcp.gsm.comwww.clinicalpharmacology.com xxx
RPathBx=1
RFileBx=2
RFileEx=logoutemailtocolleaguemyaccessmedicineloginimageindexdrugsdiagguidelinesquickampocketdiagnosticpatientedhealthnewsrssatozindexxdxasearchaddtolightbox xxxoooooooooxxox
RProt=127
LastStart=122:53:241:188:118:0:227:64:
LastEnd=14:191:234:162:118:0:227:64:
S200=3146
S304=108572
S400=2
SPar=96620
SSav=3146
SLast=200
SSiz=64480627
SMdf=3146
LFiles=111720
LSize=64847079
CFFlags=48
Substs=content.aspx &searchStr=* popup.aspx &searchStr=* 
ImgDim=0,0,0,0
PrevURL=http://www.accessmedicine.com/index.aspx
Exported=17.7.2006 a. 18:46:15 - D:\tempAM\

Are there any additional settings that have to be touched, or I have to delete the files and download them once again? They are in the range of several Gb. Is it also possible to influence the exported projects contents according to some newly set (after the actual downloading) project settings? I did a test with the images check box unmarked in the file filters category, but all the downloaded images were exported also.
Oleg Chernavin 07/18/2006 09:50 am
You need to download the whole site again using Alt+F5 first. Then changing Project settings and Ctrl+F5 to get missing files will change the links.

Oleg.