Ignore robots.txt - OEP 5.9

Author Message
Patricia 03/19/2014 09:51 am
Hi Oleg:

It has been a while, hope you are well.

I searched in the forums but didn't find an exact answer to my question. Does OEP have a setting to "ignore robots.txt" files? I have downloaded a site and it appears that the robots.txt at the host site has prevented OEP from grabbing style sheets. Also, when I tried to export it to see if that would improve the rendering, it froze after 5 files.

I've pasted the settings below. Thanks for your help!

[Object]
OEVersion=Pro 5.9.0.3254
Type=0
IID=7032
Caption=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
URL=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jsaxdcssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RSrvsBx=1
RProt=255
LastStart=203:160:100:174:150:94:228:64:
LastEnd=59:68:24:54:153:94:228:64:
LastStarted=2014-03-18 5:00:39 PM
LastEnded=2014-03-18 6:54:30 PM
S200=4162
S400=66
SPar=3047
SSav=4162
SLast=200
SSiz=10301109123
SMdf=4105
SHTML=2399
SSuccDowns=1
LFiles=4226
LSize=10301164218
ImgDim=0,0,0,0
PrevURL=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
ConvertRSS=True
Exported=2014-03-19 9:19:47 AM - W:\patty.klambauer\Download\afghanistan-export-mar19\
LIndexed=False
IndexFiles=False
Oleg Chernavin 03/19/2014 04:10 pm
Particia,

It was not related to robots.txt at all. I fixed an error in Offline Explorer. Here is the updated version:

http://www.metaproducts.com/download/betas/opsetup.exe

Thank you!

Best regards,
Oleg Chernavin
MP Staff
Patricia 03/20/2014 10:33 am
Thanks Oleg. I'll install it and give it a whirl.

I'm curious: how does OEP normally treat robots.txt files? Does it ignore them? and is there a setting that we can switch on and off for robots.txt?
Oleg Chernavin 03/20/2014 11:05 am
I see no sense in that. Offline Explorer doesn't care about robots.txt. Only you define what should be downloaded and what should be skipped.

Oleg.
Patricia 03/26/2014 03:15 pm
Hi Oleg: I installed the new version of OEP that you sent me and ran the same project settings but I still getting the same, poor results, even after export. Could you have another look? I'm pasting the settings below. Much appreciated!

[Object]
OEVersion=Pro 6.8.4085
Type=0
IID=7032
Caption=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
URL=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
Lev=1000001
Weekday=257
LTExceptions=
LTExcMode=0
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jsaxdcssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0,0,0,0,0,0,0,0
NotIgnoreLogout=False
RSrvsBx=1
RProt=255
LastStart=252:188:247:12:112:95:228:64:
LastEnd=115:60:180:84:116:95:228:64:
PrjStart=148:125:165:232:78:95:228:64:
LastStarted=2014-03-25 12:02:16 PM
LastEnded=2014-03-25 3:14:53 PM
S200=4154
S400=19
SPar=3067
SSav=4154
SLast=200
SSiz=10299670534
SMdf=4096
SHTML=2414
SSuccDowns=4
LFiles=4170
LSize=10299688343
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.afghanistan.gc.ca/canada-afghanistan/menu.aspx
ConvertRSS=True
Exported=2014-03-25 3:25:34 PM - W:\patty.klambauer\Download\2014-03-25-afghanistan-export-1\
MapStats=1,199,1,199,0,0,0,0,0,0,0,0,0,0
Oleg Chernavin 03/26/2014 03:19 pm
I downloaded it and it looks OK browsing offline inside Offline Explorer and even after export.

Can you delete the downloaded Project and maybe download it to some other directory to check?

Oleg.
Patricia 03/26/2014 05:00 pm
Hi Oleg: I did as you suggested but no improvement. I should add that the type of work that I do requires that I *always* export. In this case, when I use the OEP browser, it does display well, but my exported version does not. Perhaps my export settings aren’t optimal?

I don’t think the export is necessarily the issue because there are lots of downloaded files that indicate “not found” in their file path. I have pasted a small sample below. Do you have the same “not found” files in your download?

Patricia


notfound.aspx@404_253b_2fcanada_international_2fimportant_notices.aspx.htm
notfound.aspx@404_253bhttp_3a_2f_2fwww.afghanistan.gc.ca_2fcanada-afghanistan_2fassets_2fimages_2fanca-class1.jpg
notfound.aspx@404_253Bhttp_3A_2F_2Fwww.afghanistan.gc.ca_2Fcanada-afghanistan_2Fassets_2Fpdfs_2FCanadaCondemnsKabulAttackDari.pdf
notfound.aspx@404_253Bhttp_3A_2F_2Fwww.afghanistan.gc.ca_2Fiwglobal_2Fframeworks_2Fcss_2Fcustom.css
notfound.aspx@404_253Bhttp_3A_2F_2Fwww.afghanistan.gc.ca_2Fiwglobal_2Fframeworks_2Fjs_2Fcss_2Fpe-ap-min.css
notfound.aspx@404_253Bhttp_3A_2F_2Fwww.afghanistan.gc.ca_2Fcanada-afghanistan_2Fwet-boew.skipnav.js
Oleg Chernavin 03/27/2014 08:42 am
Please make sure that the "Use standard extensions..." box is checked in the Export dialog.

Would it make a difference?

Oleg.
Patricia 03/27/2014 09:35 am
Hi Oleg, The box is checked and it doesn't make a difference. I should clarify that the "not found" files exist in the original download, before exporting, and they remain after export. Do you have those files in your download?

Patricia
Oleg Chernavin 03/27/2014 10:04 am
Yes, sure. I also have them. Can you please ZIP the exported folder and send it to me via some filesharing service? I will compare to what I get after export.

Oleg.