FDLE predator site

Author Message
ssieloff 05/25/2005 08:59 pm
Oleg --

It has been quite a while since I`ve posted to this forum so forgive me if I am a little "rusty" ... I am trying to donwload the entire FL sexual predator site ... I want to get all Flyers that contain the picture and details of each predator ... the major pages are OffenderFlyer.asp?keys= and Result.asp?Cmd=Page&PageNumber= ... I turned off level limit and set a lname={:a..z} in major search URL for the project but do not get all the links followed ... I get about 1500 flyers but it does not appear to be following the multiple search results pages for each A-Z search. Here is my project as it stands -- I`d appreciate any ideas you might have! Thanks, Steve

[Object]
OEVersion=Pro 3.8.0.2022
Type=0
IID=7018
Caption=http://www3.fdle.state.fl.us/sexual_predators/result.asp
URL=http://www3.fdle.state.fl.us/sexual_predators/result.aspPOST=PageID=Search.asp&SPage=253863740&GeoOptions=0&fname=&lname={:a..z}&city=&zip=&idSearch=SubmitIgnoreLogOutLinksAdditional=ConvertPOSTToFileNameReferer=http://www3.fdle.state.fl.us/sexual_predators/search.asp?sopu=true&PSessionId=253863740&SetCookie=ASPSESSIONIDCQDSCDTD=GBOFBCPAIEJIKICHIJABIDPM; ASPSESSIONIDCQDSCDTD=GBOFBCPAIEJIKICHIJABIDPM; ASPSESSIONIDAQCQCDTC=CIAAEOPDNAONFHLLBPMPGGJM
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
FTUDef.Exts=jscssssivbsdtdxslswf
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
RProt=127
LastStart=103:152:198:111:59:204:226:64:
LastEnd=242:200:70:186:59:204:226:64:
S200=1512
S304=1
SAbr=4
SPar=1183
SSav=1512
SLast=200
SSiz=21792253
SMdf=1512
LFiles=1513
LSize=103855597
Stopped=True
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www3.fdle.state.fl.us/sexual_predators/result.asp
Oleg Chernavin 05/26/2005 04:48 am
Steve,

I tried to load it myself and it worked well. What I did was - I browsed to the following URL in the Internal browser:

http://www3.fdle.state.fl.us/sexual_predators/search.asp?sopu=true&PSessionId=614109604&

Typed a in the Last name, recorded the form (Ctrl+Alt), then removed the SetCookie= line from the URLs field and changed the a to {:a..c}. Then I loaded the Project and all 3 pages a, b and c search results offline were exactly as on the original site.

I would say that perhaps when you exit OE you will need to browse to the search page again in the Internal browser to set the correct cookie and only then start loading the Project.

Best regards,
Oleg Chernavin
MP Staff
ssieloff 05/26/2005 07:00 pm
> Steve,
>
> I tried to load it myself and it worked well. What I did was - I browsed to the following URL in the Internal browser:
>
> http://www3.fdle.state.fl.us/sexual_predators/search.asp?sopu=true&PSessionId=614109604&
>
> Typed a in the Last name, recorded the form (Ctrl+Alt), then removed the SetCookie= line from the URLs field and changed the a to {:a..c}. Then I loaded the Project and all 3 pages a, b and c search results offline were exactly as on the original site.
>
> I would say that perhaps when you exit OE you will need to browse to the search page again in the Internal browser to set the correct cookie and only then start loading the Project.
>
> Best regards,
> Oleg Chernavin
> MP Staff

OK ... but did the program follow/load all the sub-pages for last name A and get the Flyer pages? I need to crawl all returned pages for the last name A and extract the flyer details from each results page ... then move to B getting all flyer details from all B pages (1-99) etc.

Thanks for your help ... as a long time user I LOVE this program and I am finding new uses for it daily!!!!!

Best Regards,

Steve
Oleg Chernavin 05/27/2005 03:21 am
Yes, I just did the following - I used your settings but replaced the URLs field contents with new lines, because the session ID changed. Then I loaded it and I saw that the flyers are loading correctly.

Oleg.