Sequencing of links?

Author Message
ssieloff 11/17/2005 11:08 am
Oleg --

Another interesting challenge with multiple navigation links at bottom of page ... this sex offender site uses First+Page, Next+Page, Previous+Page and Last+Page as navigational links at bottom of page ... the problem is that once the last first or previous pages are loaded it "poisons" the other links as they reference from current page ... for example, the fist page loads the navigational link of last page whcih, when parsed after page 1 is processed, will set the references to the end of the list and all subsequent Next+page parsing results in the last page only being continually returned -- I basically get the first and last pages but nothing inbetween ... the information appears to be passed to application via post but is dependent upon the previous page information for valid indexing and display ... here is base link:

http://www.nj.gov/njsp/info/reg_sexoffend.html

Click "accept" and you will get this link:

https://www6.state.nj.us/LPS_spoff/SetSession which appears to set the sessionId in the cookie ... choose Geographical search and you will get the following page:

Choose the county of "Atlantic" from the drop down and hit "Submit" and you will see the following:

https://www6.state.nj.us/LPS_spoff/geographicsearch1.jsp

Hit "submit" for the option of using all of the county "Atlantic" and you get first page of results. I want to be able to load and process the entire list of pages by simply following the "Next Page" button and parsing the subsequent links on the pages ... but the other navigational links appear to be messing with the session variables and do not let me retrieve all pages. Can you help me again?

Here is one of several project files I have tried to use (unsuccessfully) ... this would appear to be fairly simple as the links are not the ususal complex Javascript style but I am stuck!

Thanks,

Steve

[Object]
OEVersion=Pro 3.9.0.2125
Type=0
IID=7019
Caption=https://www6.state.nj.us/LPS_spoff/findDriver
URL=https://www6.state.nj.us/LPS_spoff/findDriverPOST=screen=1&county=01&countyname=&pmonth=&pday=&pyr=&Submit=SubmitIgnoreLogOutLinksAdditional=ConvertPOSTToFileNameSetCookie=JSESSIONID=08C92E5BB463B6ECA3B9A15567569111; JROUTE=9-mO
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
EnableForms=True
FMGroup=1
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
FTUDef.Exts=jscssssivbsdtdxslswf
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
RSrvsBx=3
RPathBx=1
RFileEx=searchresults.jsp?type=last+pagesearchresults.jsp?type=first+pagesearchresults.jsp?type=previous+page xxx
RProt=127
LastStart=85:119:43:38:46:226:226:64:
LastEnd=20:108:54:49:46:226:226:64:
S200=374
S400=1
SAbr=357
SPar=350
SSav=374
SLast=200
SSiz=7874497
SMdf=108
LFiles=375
LSize=7958465
Stopped=True
ImgDim=0,0,0,0
PrevURL=https://www6.state.nj.us/LPS_spoff/findDriver
ExploreDirs=True
ExploreSSMaps=True
ParseComplexScripts=True
ssieloff 11/22/2005 08:25 pm
Oleg --

Any ideas here?

Have a safe Thanksgiving!

Steve


> Oleg --
>
> Another interesting challenge with multiple navigation links at bottom of page ... this sex offender site uses First+Page, Next+Page, Previous+Page and Last+Page as navigational links at bottom of page ... the problem is that once the last first or previous pages are loaded it "poisons" the other links as they reference from current page ... for example, the fist page loads the navigational link of last page whcih, when parsed after page 1 is processed, will set the references to the end of the list and all subsequent Next+page parsing results in the last page only being continually returned -- I basically get the first and last pages but nothing inbetween ... the information appears to be passed to application via post but is dependent upon the previous page information for valid indexing and display ... here is base link:
>
> http://www.nj.gov/njsp/info/reg_sexoffend.html
>
> Click "accept" and you will get this link:
>
> https://www6.state.nj.us/LPS_spoff/SetSession which appears to set the sessionId in the cookie ... choose Geographical search and you will get the following page:
>
> Choose the county of "Atlantic" from the drop down and hit "Submit" and you will see the following:
>
> https://www6.state.nj.us/LPS_spoff/geographicsearch1.jsp
>
> Hit "submit" for the option of using all of the county "Atlantic" and you get first page of results. I want to be able to load and process the entire list of pages by simply following the "Next Page" button and parsing the subsequent links on the pages ... but the other navigational links appear to be messing with the session variables and do not let me retrieve all pages. Can you help me again?
>
> Here is one of several project files I have tried to use (unsuccessfully) ... this would appear to be fairly simple as the links are not the ususal complex Javascript style but I am stuck!
>
> Thanks,
>
> Steve
>
> [Object]
> OEVersion=Pro 3.9.0.2125
> Type=0
> IID=7019
> Caption=https://www6.state.nj.us/LPS_spoff/findDriver
> URL=https://www6.state.nj.us/LPS_spoff/findDriverPOST=screen=1&county=01&countyname=&pmonth=&pday=&pyr=&Submit=SubmitIgnoreLogOutLinksAdditional=ConvertPOSTToFileNameSetCookie=JSESSIONID=08C92E5BB463B6ECA3B9A15567569111; JROUTE=9-mO
> Lev=1000001
> Weekday=257
> LimTSize=10000
> LimNumber=5000
> LimTime=100
> EnableForms=True
> FMGroup=1
> FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
> FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
> FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
> FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
> FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
> FTUDef.Exts=jscssssivbsdtdxslswf
> FTText.B=ooxooo
> FTImages.B=ooxooo
> FTVideo.B=ooxooo
> FTAudio.B=ooxooo
> FTArchive.B=ooxooo
> FTUDef.B=ooxooo
> FTOther.B=ooxooo
> FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
> RSrvsBx=3
> RPathBx=1
> RFileEx=searchresults.jsp?type=last+pagesearchresults.jsp?type=first+pagesearchresults.jsp?type=previous+page xxx
> RProt=127
> LastStart=85:119:43:38:46:226:226:64:
> LastEnd=20:108:54:49:46:226:226:64:
> S200=374
> S400=1
> SAbr=357
> SPar=350
> SSav=374
> SLast=200
> SSiz=7874497
> SMdf=108
> LFiles=375
> LSize=7958465
> Stopped=True
> ImgDim=0,0,0,0
> PrevURL=https://www6.state.nj.us/LPS_spoff/findDriver
> ExploreDirs=True
> ExploreSSMaps=True
> ParseComplexScripts=True
>
Oleg Chernavin 11/23/2005 01:42 pm
I am sorry. I am out of our office now. I tried to look at it quickly, but didn`t solve it yet. I will work more on it.

Oleg.
ssieloff 11/25/2005 03:10 pm
Oleg --

I hope you had a good Thanksgiving Day and didn`t eat too much turkey and trimmings! Not to push but any update(s) on how to download these links/pages?

Thanks,

Steve

> Oleg --
>
> Another interesting challenge with multiple navigation links at bottom of page ... this sex offender site uses First+Page, Next+Page, Previous+Page and Last+Page as navigational links at bottom of page ... the problem is that once the last first or previous pages are loaded it "poisons" the other links as they reference from current page ... for example, the fist page loads the navigational link of last page whcih, when parsed after page 1 is processed, will set the references to the end of the list and all subsequent Next+page parsing results in the last page only being continually returned -- I basically get the first and last pages but nothing inbetween ... the information appears to be passed to application via post but is dependent upon the previous page information for valid indexing and display ... here is base link:
>
> http://www.nj.gov/njsp/info/reg_sexoffend.html
>
> Click "accept" and you will get this link:
>
> https://www6.state.nj.us/LPS_spoff/SetSession which appears to set the sessionId in the cookie ... choose Geographical search and you will get the following page:
>
> Choose the county of "Atlantic" from the drop down and hit "Submit" and you will see the following:
>
> https://www6.state.nj.us/LPS_spoff/geographicsearch1.jsp
>
> Hit "submit" for the option of using all of the county "Atlantic" and you get first page of results. I want to be able to load and process the entire list of pages by simply following the "Next Page" button and parsing the subsequent links on the pages ... but the other navigational links appear to be messing with the session variables and do not let me retrieve all pages. Can you help me again?
>
> Here is one of several project files I have tried to use (unsuccessfully) ... this would appear to be fairly simple as the links are not the ususal complex Javascript style but I am stuck!
>
> Thanks,
>
> Steve
>
> [Object]
> OEVersion=Pro 3.9.0.2125
> Type=0
> IID=7019
> Caption=https://www6.state.nj.us/LPS_spoff/findDriver
> URL=https://www6.state.nj.us/LPS_spoff/findDriverPOST=screen=1&county=01&countyname=&pmonth=&pday=&pyr=&Submit=SubmitIgnoreLogOutLinksAdditional=ConvertPOSTToFileNameSetCookie=JSESSIONID=08C92E5BB463B6ECA3B9A15567569111; JROUTE=9-mO
> Lev=1000001
> Weekday=257
> LimTSize=10000
> LimNumber=5000
> LimTime=100
> EnableForms=True
> FMGroup=1
> FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
> FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
> FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
> FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
> FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
> FTUDef.Exts=jscssssivbsdtdxslswf
> FTText.B=ooxooo
> FTImages.B=ooxooo
> FTVideo.B=ooxooo
> FTAudio.B=ooxooo
> FTArchive.B=ooxooo
> FTUDef.B=ooxooo
> FTOther.B=ooxooo
> FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
> RSrvsBx=3
> RPathBx=1
> RFileEx=searchresults.jsp?type=last+pagesearchresults.jsp?type=first+pagesearchresults.jsp?type=previous+page xxx
> RProt=127
> LastStart=85:119:43:38:46:226:226:64:
> LastEnd=20:108:54:49:46:226:226:64:
> S200=374
> S400=1
> SAbr=357
> SPar=350
> SSav=374
> SLast=200
> SSiz=7874497
> SMdf=108
> LFiles=375
> LSize=7958465
> Stopped=True
> ImgDim=0,0,0,0
> PrevURL=https://www6.state.nj.us/LPS_spoff/findDriver
> ExploreDirs=True
&
ssieloff 11/28/2005 08:12 pm
Oleg --

Have you had any further ideas on this site?

Thanks,

Steve

> Oleg --
>
> Another interesting challenge with multiple navigation links at bottom of page ... this sex offender site uses First+Page, Next+Page, Previous+Page and Last+Page as navigational links at bottom of page ... the problem is that once the last first or previous pages are loaded it "poisons" the other links as they reference from current page ... for example, the fist page loads the navigational link of last page whcih, when parsed after page 1 is processed, will set the references to the end of the list and all subsequent Next+page parsing results in the last page only being continually returned -- I basically get the first and last pages but nothing inbetween ... the information appears to be passed to application via post but is dependent upon the previous page information for valid indexing and display ... here is base link:
>
> http://www.nj.gov/njsp/info/reg_sexoffend.html
>
> Click "accept" and you will get this link:
>
> https://www6.state.nj.us/LPS_spoff/SetSession which appears to set the sessionId in the cookie ... choose Geographical search and you will get the following page:
>
> Choose the county of "Atlantic" from the drop down and hit "Submit" and you will see the following:
>
> https://www6.state.nj.us/LPS_spoff/geographicsearch1.jsp
>
> Hit "submit" for the option of using all of the county "Atlantic" and you get first page of results. I want to be able to load and process the entire list of pages by simply following the "Next Page" button and parsing the subsequent links on the pages ... but the other navigational links appear to be messing with the session variables and do not let me retrieve all pages. Can you help me again?
>
> Here is one of several project files I have tried to use (unsuccessfully) ... this would appear to be fairly simple as the links are not the ususal complex Javascript style but I am stuck!
>
> Thanks,
>
> Steve
>
> [Object]
> OEVersion=Pro 3.9.0.2125
> Type=0
> IID=7019
> Caption=https://www6.state.nj.us/LPS_spoff/findDriver
> URL=https://www6.state.nj.us/LPS_spoff/findDriverPOST=screen=1&county=01&countyname=&pmonth=&pday=&pyr=&Submit=SubmitIgnoreLogOutLinksAdditional=ConvertPOSTToFileNameSetCookie=JSESSIONID=08C92E5BB463B6ECA3B9A15567569111; JROUTE=9-mO
> Lev=1000001
> Weekday=257
> LimTSize=10000
> LimNumber=5000
> LimTime=100
> EnableForms=True
> FMGroup=1
> FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
> FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
> FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
> FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
> FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
> FTUDef.Exts=jscssssivbsdtdxslswf
> FTText.B=ooxooo
> FTImages.B=ooxooo
> FTVideo.B=ooxooo
> FTAudio.B=ooxooo
> FTArchive.B=ooxooo
> FTUDef.B=ooxooo
> FTOther.B=ooxooo
> FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
> RSrvsBx=3
> RPathBx=1
> RFileEx=searchresults.jsp?type=last+pagesearchresults.jsp?type=first+pagesearchresults.jsp?type=previous+page xxx
> RProt=127
> LastStart=85:119:43:38:46:226:226:64:
> LastEnd=20:108:54:49:46:226:226:64:
> S200=374
> S400=1
> SAbr=357
> SPar=350
> SSav=374
> SLast=200
> SSiz=7874497
> SMdf=108
> LFiles=375
> LSize=7958465
> Stopped=True
> ImgDim=0,0,0,0
> PrevURL=https://www6.state.nj.us/LPS_spoff/findDriver
> ExploreDirs=True
> ExploreSSMaps=True
> ParseComplexScripts=True
>
Oleg Chernavin 11/29/2005 09:11 am
Sorry that it took a while.

The problem is that all links to the same page are identical. I would suggest you to have two Projects. The first will be as it is, but disable forms exploration. The other one would be Alt+Ctrl on the Next Page button. Set it to Level=1 and allow File Copies. You will download the first Project and then you will have to download the second one sequentally the number of times to get all pages.

I know, it is manual, but I don`t see another way.

Oleg.