downloading a site with exclusions and inclusions 3/7

Author Message
wbtstm 11/30/2010 06:25 am
hi,
i want to download this site http://www.alhawali.com
and don't want to download files with these words in link or in file name
home.cart
smartingo.com.sa
contactus
advsearch
actsearch
search
registration
smsreg
home.logout
home.mypage
mailinglist

i want to use this in login
name : myname
pass : mypass

i tried to save my customized project in offline explorer but it's so slow more than 1 hour
Oleg Chernavin 11/30/2010 06:32 am
Please follow the advices in other replies to your posts. Then simply logon the site in the Internal browser of Offline Explorer and start the download.

Best regards,
Oleg Chernavin
MP Staff
wbtstm 12/02/2010 04:23 am
thanks

i have also this problem

i can't download files from pages like :

http://www.alhawali.com/index.cfm?method=home.download&contentid=1&type=word

the download link is avaible only in internet explorer and oe don't get it when i tried

the download link is like this :

index.cfm?method=home.downloadnow&type=word&cate=supportfiles&contentid=1

thanks in advance
Oleg Chernavin 12/02/2010 11:33 am
Give me the page URL that contains this link.

Oleg.
wbtstmoi 12/03/2010 02:14 pm
i gave you the url :

http://www.alhawali.com/index.cfm?method=home.download&contentid=1&type=word

and the link that's impossible to download (in a javacript i think)

http://www.alhawali.com/index.cfm?method=home.downloadnow&type=word&cate=supportfiles&contentid=1
Oleg Chernavin 12/04/2010 03:15 pm
OK. I improved scripts parsing. Thank you for pointing to this link!

Oleg.
wbtstmoi 12/05/2010 07:49 am
do you mean that it is possible with my version
or that you have improved it and posted an update
or that in future it will be able
Oleg Chernavin 12/05/2010 08:18 am
Please simply add this URL to the URLs field of the Project. This will be enough. Next published version will have this fix included.

Oleg.
wbtstmoi 12/05/2010 08:30 am
but i have a lot of files like this one with contentid changing and type changing also
how do i do
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=word&cate=supportfiles&contentid=2
or
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=zip&cate=supportfiles&contentid=300
and do the links point offline or how to make them pointed in the html files downloaded with oe
Oleg Chernavin 12/05/2010 09:20 am
True. Here is the updated oe.exe file:

http://www.metaproducts.com/download/betas/OEP3292.ZIP

Oleg.
wbtstmoi 12/05/2010 01:55 pm
thank you very match

I am using "offline explorer entreprise" but you gave me and update "offfline explorer pro"
i downloaded the file and overwrited the oe.exe but made a copy of the first one it in an other place in case of problem

do you suggest me to restore it or to continue using offline explorer pro

and how about my registration in offline explorer entreprise

after i read the offline browsers comparaison i think i will use oep for few days an return to oee later

if you have and update for oee i will really appreciate it

thank you in advance
Oleg Chernavin 12/05/2010 02:59 pm
Oh, sorry. Please test this version and if it gets all links, I will send you Enterprise one tomorrow. If some links still fail, I will work to improve it more.

Oleg.
webstat 12/11/2010 01:42 pm
hi,
i was testing the download but when i tried to login with arabic characters but i got ????????? how to do ?

thank you in advance

Oleg Chernavin 12/11/2010 02:07 pm
Try to logon in MS IE and check the box to remember you on this site.

Oleg.
wbtstmoi 12/11/2010 03:28 pm
i wanted to download partially the site with my version of oee

and in the end i replace the oee with the oe pro that you gave me and check download missing files
would i have the same result if i download the same part of the site with oe pro

thank you very match you helped me a lot and i am disturbing you with my bad english and silly questions
wbtstmoi 12/11/2010 03:54 pm
i forgot to tell you that it's impossible for me to skip files like this
http://www.alhawali.com/index.cfm@method=home.GuestBookXXXXXXXXXXXXXXXXXXXXX
i could not do it please help me
Oleg Chernavin 12/13/2010 09:37 am
What about adding

method=home.GuestBook

to the URL Filters - Filename - Excluded list?

Oleg.
wbtstmoi 12/13/2010 12:21 pm
the same thing with this link

http://www.tawhed.ws/dall-a?i=147
or
http://www.tawhed.ws/dall-a?i=xxx
xxx changing
i could not exclude it

thank you in advance
Oleg Chernavin 12/13/2010 03:03 pm
dall-a?i=

I think, it is very simple and obvious.

Oleg.
wbtstmoi 12/23/2010 03:31 pm
how about these

http://www.tawhed.ws/sn?name=%d8%b4%d9%81%d8%a7%d8%a1%20%d8%a7%d9%84%d9%86%d9%81%d9%88%d8%b042f9481a2.htm

http://www.tawhed.ws/faq/register?pageqa=&page=&qid=2749.htm

http://www.tawhed.net/dall-a.php?i=23

and similar


exclude file name
register?pageqa=
register
sn?name=
sn.php?name=
sn.php
sn
dall-c.php?i=
dall-a.php?i=

but files still in queue and downloaded i am 100 % sure
Oleg Chernavin 12/23/2010 03:31 pm
Please send me exact settings of this Project. Export - Project settings - Copy - paste to the message.

Oleg.
wbtstmoi 12/23/2010 04:04 pm
Stream 1.2 File
[Object]
OEVersion=Enterprise 5.9.0.3284
Type=0
IID=7010
Caption=taw
URL=http://www.tawhed.ws/http://www.tawhed.net/http://mtj.fm/http://www.mtj.fm/
MVer=5
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
CheckSize=True
SkipMedia=True
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,1,1
NotIgnoreLogout=False
RSrvsBx=1
RFileEx=registerloginfindthissn.php?name xxxx
RProt=255
LastStart=186:30:55:215:113:201:227:64:
LastEnd=169:141:18:11:112:201:227:64:
LastStarted=11/12/2010 13:22:49
LastEnded=11/12/2010 12:01:56
S304=819
SPar=17
SLast=304
LFiles=12977
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.tawhed.ws/
SkipURLs=contactdall-findthisforgotloginregistersearchsn.php?namesn?name
ConvertRSS=True
LIndexed=False
IndexFiles=False
wbtstmoi 12/23/2010 04:05 pm
however
i excluded the others in oep
and don't know what is xxxx

thank you in advance
Oleg Chernavin 12/23/2010 05:07 pm
I see now. Please do not use the URL Ommissions section. Use URL Filters - Filename - Excluded list only. The first one is for full URLs only.

Regarding the xxxx in the Project settings - it means checked/unchecked state.

Oleg.
wbtstmoi 12/23/2010 05:18 pm
yeah
i forgot to remove them when you answered my question

but i still could not remove the files listed previously
Oleg Chernavin 12/23/2010 05:48 pm
Just use the Project Map to delete them. Changing Project settings only affects what you download. If you exclude some URL, the corresponding file will be not removed from the disk. It will be simply skipped from next downloads.

Oleg.
wbtstmoi 12/24/2010 01:18 am
want i meant is that i could not exclude them with offline explorer enterprise.

I deleted them manually from my hdd but i don't want them to be parsed.
Here is my latest project settings with offline explorer pro.

[Object]
OEVersion=Pro 5.9.0.3292
Type=0
IID=7012
Caption=taw
URL=http://www.tawhed.ws/http://www.tawhed.net/http://www.mtj.fm/
MVer=5
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
CheckSize=True
SkipMedia=True
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,1,1
NotIgnoreLogout=False
RSrvsBx=1
RPathEx=%&ovr0%22..%22http: xxx
RFileEx=registerloginfindthisregister?pageqa=snsn.phpsn?name=sn.php?name=dall-cdall-c?i=dall-c.php?i=dall-adall-a?i=dall-a.php?i=section-archivescholar-archive xxxxxxxxxxxxxxxx
RProt=255
LastStart=208:202:149:226:0:203:227:64:
LastEnd=57:153:164:146:9:203:227:64:
LastStarted=24/12/2010 00:39:49
LastEnded=24/12/2010 07:10:46
S304=28327
S400=49
SPar=28327
SLast=404
SSuccDowns=1
LFiles=28376
LSize=22611
Flags=1
SubstsB=aHR0cDovL3Rhd2hlZAl0YXdoZWQJd3d3LnRhd2hlZA0KaHR0cDovL210agltdGoJd3d3Lm10ag0K
ImgDim=0,0,0,0
ConvertRSS=True
LIndexed=False
IndexFiles=False

Oleg Chernavin 12/24/2010 10:49 am
File Filters - Others - Select Load using URL Filters settings.

Oleg.
wbtstmoi 01/03/2011 06:21 pm
thank you very match

i figured
2 files named 7.rm but there content is different and i want to download them.
With oee i get only one file.

the links are
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=rm&cate=audiofiles&contentid=571
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=rm&cate=audiofiles&contentid=4211

thank you in advance

and for the offline explorer pro beta version it worked great could you please send me the enterprise one when possible
Oleg Chernavin 01/05/2011 09:00 am
Here it is:

http://www.metaproducts.com/download/betas/OEE3307.ZIP

Oleg.
wbtstmoi 01/05/2011 02:16 pm
thank you very match

you have not suggest me any thing for my question

2 files with same name
Oleg Chernavin 01/06/2011 10:50 am
Sorry. The only way is to add the following line to the Project's URLs field:

Additional=SkipDisposition

But this will not create 7.rm file at all - the filenames will be made only according to the URLs.

Oleg.
wbtstmoi 01/06/2011 02:31 pm
thank you

but couldn't i download the first in a directory and the second in an other one, or rename the second one to 7_1.rm or similar
Oleg Chernavin 01/07/2011 04:25 am
Yes, you can't. I am thinking on how to improve the code to handle such cases.

Oleg.
wbtstm 01/07/2011 04:37 am
i will wait till than

thank you very match for your help
wbtstmoi 01/07/2011 08:57 am
i have not tested this yet but i hope that it do this
for example in the download directory oee create a folder audiofiles and a subfolder 571 and create the files pointed by this url
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=rm&cate=audiofiles&contentid=571


what if i do this :

in url substitution
url
http://www.alhawali.com/index.cfm?method=home.downloadnow
change
***cate=**&contentid=*
with
**\*\
without checking the box

does that solve my problem or not
wbtstmoi 01/07/2011 09:01 am
i have not tested this yet but i hope that it do this
for example :
in the download directory oee creates
a folder audiofiles, a subfolder 571 , downloadnow571 and 7.rm in the sub folder for the url
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=rm&cate=audiofiles&contentid=571

in url substitution
url http://www.alhawali.com/index.cfm?method=home.downloadnow
change ***cate=**&contentid=*
with **\*\downloadnow*
without checking the box

will that work
Oleg Chernavin 01/07/2011 01:49 pm
Yes, try that - perhaps it will workout.

Oleg.
wbtstm 01/10/2011 12:20 pm
hi,

I tried it and it work the first time

original link
http://www.alhawali.com/index.cfm?method=home.downloadnow&type=rm&cate=audiofiles&contentid=571
link with substitue (not checked in oee)
http://www.alhawali.com/audiofiles/571/downloadnow571
link in queue before quit is the orginal
after quiting and running oee ctrl+F5 and F9 i get the second one in queue


how to do ??

I tried to redownload the site a lot but got the same problem
wbtstm 01/11/2011 02:55 pm
hi,

waiting your response for my last question in this thread i start reading different topics to know more about oee and asquing less

I found out that you was ill . Are you fine now?

I am so sorry for causing fevere ... :)

forgive my bad english

i don't know how to speak well.
Oleg Chernavin 01/15/2011 08:48 am
Sorry for the late answer. Yes, this is an expected effect of Ctrl+F5. When you use URL Substitutes, the modified links are written in the HTML files and when they are processed on the next download, these links are used to follow, not the original ones.

Oleg.
wbtstm 01/16/2011 11:15 am
hi,

no problem.

You helped me a lot

however couldn't I parse the orginal links only (it doesn't matter if i download an other copy with original links to follow for later parsing)

Thank you very match
Oleg Chernavin 01/16/2011 11:31 am
Yes, using the "Download new and modified files" or Shift+F5.

Oleg.
wbtstm 01/21/2011 02:04 pm
Thank you very match,

I downloaded successfully the site with the beta version of oee and with url substitute and additional=keepprimary in project's url

Oleg Chernavin 01/24/2011 08:05 am
This is good!

Oleg.