downloading a site other question 7/7 +1/3

Author Message
wbtstmoi 12/13/2010 02:35 pm
1)how to not download files that are the same from http://www.site.com and http://site.com and download all the others

2) you have not answered this question in my post "downloading a site with exclusions and inclusions 3/7":
"""
i wanted to download partially the site with my version of oee
and in the end i replace the oee with the oe pro that you gave me and check download missing files
would i have the same result if i download the same part of the site with oe pro
"""

3) how to have a list of all the missing files or links in my project

4) how to pass parsing *.zip but download them

5) pass downloading and parsing links with &id=1 to &id=100 and download all the others

6) all the websites referred in the website that i want to download so i can specify wish one download and wish one not
Oleg Chernavin 12/13/2010 03:10 pm
1. I will add this feature later.

2. Answered. Please be patient - I can't answer requests immediately.

3. Select the Project, press Ctrl+F5, F9. Then wait for parsing to complete and see the Queue tab.

4. ZIP files are not parsed. Only downloaded.

5. Properties - URL Filters - Filename - Excluded list:

&id=[0..9]

6. What do you mean?

Best regards,
Oleg Chernavin
MP Staff
wbtstmoi 12/13/2010 11:07 pm
thank you very match

however for the point 4 i have a zip file that is parsed with offline explorer enterprise and gives wrong links with non alphabetic and i will give you the url of this zip in few days

for the 6 you have answered me in point 3
Oleg Chernavin 12/14/2010 07:30 am
Yes, I need to look at this URL to reproduce and fix. Actually, it could be a web page but with ZIP extension. It happens on some sites, so extensions are not that reliable.

Oleg.
wbtstmoi 12/14/2010 10:35 am
no it's a zip file with mp3 in it
i have the url but could not download it or access it
so i am searching the url referring it

the url is

http://tawhed.net/StatUmah.zip

and i am sure that i have downloaded it with offline explorer enterprise two days ago
wbtstmoi 12/14/2010 10:43 am
7 can i download zips from subdiretories
http://tawhed.net/downloaddirectory/StatUmah.zip
and not from the root
http://tawhed.net/StatUmah.zip

because i got lots of link in the root that i don't want to download and don't know how to setup this
Oleg Chernavin 12/14/2010 11:39 am
Yes, there is no such file on the site. I need to know its referer to understand where this link comes. You may start the download and watch the queue. Once such link appears there, copy its referer to track.

Oleg.
wbtstmoi 12/14/2010 12:50 pm
i searched it not with oee but with another software :
the link is http://tawhed.net/dll2.php?i=StatUmah
Oleg Chernavin 12/14/2010 01:49 pm
Yes, this is correct. The server says:

Content-Disposition: attachment; filename="StatUmah.zip";

And Offline Explorer saves this ZIP file under the StatUmah.zip name.

It is not parsed, because its MIME-type is application/zip. I just made the download of this URL. It is not parsed.

Oleg.
wbtstmoi 12/14/2010 02:03 pm
i am a newbie but i am sure that the referrer of wrong link are the file that i gave you and i can save the queue in oee and upload the image if you want to.
i will do so in no more than 7 days because i have a lot of work.
Oleg Chernavin 12/14/2010 02:03 pm
OK.

Oleg.
wbtstmoi 12/23/2010 02:14 pm
for the question 1. I found the solution : use url substitute in parsing section
wbtstmoi 12/23/2010 02:41 pm
hi,

here is an example of parsing downloaded files
ctrl+f5 , f9 ,

http://www.tawhed.net/OQ8s-al)Oeuf’«l*?i0?1e†Wm
http://www.tawhed.net/B*°eoJuOA?M+ btoj»u?.?‰­M??
http://www.tawhed.net/Uv?i?Y•`??o­›°¬Xa&?VUi¬I5Xhxzdc<Na®.i

referrer respectively

http://www.tawhed.net/1211091k.zip
http://www.tawhed.net/1709091a.zip
http://www.tawhed.net/0907101j.zip

forgive me for staying more than 7 days
Oleg Chernavin 12/23/2010 02:47 pm
I can't reproduce this. I made a Project with the URL:
http://tawhed.net/dll2.php?i=StatUmah

level=unlimited

It loaded 1 ZIP file, so there are total two files:
dll2.php?i=StatUmah
StatUmah.zip

The first one only has 1 link to the ZIP.

I start with Ctrl+F5 and get the following in the log:

QUEUE - 23.12.2010 22:45:01 - Parsing (0). http://tawhed.net/dll2.php?i=StatUmah
QUEUE - 23.12.2010 22:45:01 - Parsing end.
QUEUE - 23.12.2010 22:45:01 - Parsing files added.

No weird link is added and the actual ZIP is not parsed. How to get the behavior in your case?

Oleg.
wbtstmoi 12/23/2010 03:09 pm
i clic download only know and clic pause and wait until
parsing 2500 i get links that i excluded but still in the queue
after 2683 files i get some wild links

here is the project settings i deleted the wild characters and some lines not modified

OEVersion=Enterprise 5.9.0.3284
Type=0
IID=7010
URL=http://www.tawhed.ws/ http://www.tawhed.net/ http://mtj.fm/ http://www.mtj.fm/
MVer=5
Lev=1000001
Weekday=257
CheckSize=True
SkipMedia=True
FTText.Exts=all
FTImages.Exts=all
FTVideo.Exts=all
FTAudio.Exts=all
FTUDef.Exts=all
RSrvsBx=1
RProt=255
LastStart=186:30:55:215:113:201:227:64:
LastEnd=169:141:18:11:112:201:227:64:
S304=819
SPar=17
SLast=304
LFiles=12977
Flags=1
ImgDim=0,0,0,0
SkipURLs=contact findthis forgot login register search sn.php?name sn?name
Oleg Chernavin 12/23/2010 03:35 pm
Please simply paste unmodfied. These characters are necessary.

Oleg.
wbtstmoi 12/23/2010 04:06 pm
Stream 1.2 File
[Object]
OEVersion=Enterprise 5.9.0.3284
Type=0
IID=7010
Caption=taw
URL=http://www.tawhed.ws/http://www.tawhed.net/http://mtj.fm/http://www.mtj.fm/
MVer=5
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
CheckSize=True
SkipMedia=True
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,1,1
NotIgnoreLogout=False
RSrvsBx=1
RFileEx=registerloginfindthissn.php?name xxxx
RProt=255
LastStart=186:30:55:215:113:201:227:64:
LastEnd=169:141:18:11:112:201:227:64:
LastStarted=11/12/2010 13:22:49
LastEnded=11/12/2010 12:01:56
S304=819
SPar=17
SLast=304
LFiles=12977
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.tawhed.ws/
SkipURLs=contactdall-findthisforgotloginregistersearchsn.php?namesn?name
ConvertRSS=True
LIndexed=False
IndexFiles=False
Oleg Chernavin 12/23/2010 05:46 pm
I made the download and during it found the URLs:

http://www.tawhed.net/dll2.php?i=1211091k
http://www.tawhed.net/dll2.php?i=1709091a

I watched them during the download and saw that the ZIP files were not parsed. Then I stopped the download and used Ctrl+F5, F9 to see whether they will be parsed. I used the Log window - it didn't indicate parsing. Then I listed the whole Queue trying to find the weird URLs you described. None of them at all.

I think, the problem could be - you probably deleted descr.wd3 files from the disk. They contain important information - MIME types and modification dates. If you did so, it is possible that they are parsed.

Oleg.
wbtstmoi 12/24/2010 01:06 am
Yes i deleted it accidentaly, how to recover it or made new one with oee

I am so sorry for taking your time ...
Oleg Chernavin 12/24/2010 10:46 am
Only to redownload the site. Offline Explorer will get this information again.

Oleg.
wbtstmoi 01/04/2011 01:04 pm
do i have to download all the site or only html pages

because i have not downloaded the zip files and similar with oe

(like i told you in my previous posts)

if i got only problem with some zips and i redownload them with oe do that recreate the descr.wd3
wbtstmoi 01/04/2011 03:35 pm
after looking and downloading lots of sites i get this error often with various files

Parser error 2: Access violation at address 0040831F in module 'OE.exe'. Write of address 00000004 URL: http://s203841464.onlinehome.us/waqfeya/books/16/1501.rar

i think that this is the source of the problem of wrong urls
note : i haven't deleted the descr file
Oleg Chernavin 01/05/2011 07:50 am
I tried to download the URL
http://s203841464.onlinehome.us/waqfeya/books/16/1501.rar

but it redirects me to http://www.waqfeya.com/.

Maybe I should specify some referrer?

Oleg.
wbtstm 01/07/2011 04:43 am
how about my previous question

""
do i have to download all the site or only html pages
because i have not downloaded the zip files and similar with oe
(like i told you in my previous posts)
if i got only problem with some zips and i redownload them with oe do that recreate the descr.wd3
""

for the last parsing error, i reinstalled offline explorer enterprise and the problem is solved.

I think something went wrong when i copied the beta version that you gave me

thank you very match

Oleg Chernavin 01/07/2011 01:52 pm
I think, you may erase ZIP files and use Ctrl+F5 to get them all again.

Oleg.
wbtstm 07/25/2011 04:03 am
i recently used SkipParsingFiles and it do the job

thank you very match however
Oleg Chernavin 07/25/2011 06:32 pm
What files did you disable from parsing? Perhaps, it is better to improve the code, so such files are not parsed anyway? Can you show some real URLs that should not be parsed? Thank you!

Oleg.
wbtstm 07/25/2011 08:43 pm
Hi again,

I skip parsing archive, audio, video and pdf like files from parsing because when mime type is not avaible due to hdd problem or deleted.


This have effect with sites that user this kind of zip files the url is XXXXXXXXX/dl?i=XXXX for example however the true file name is XXXX.zip or something else.
When parsing without descr.wd3, oee thinks that the url is for a php file or html file (that's what i understand) and tries to parse it thus slowing done the parsing process and giving a lot of wrong urls and access violation problems.

Maybe I am over doing things but this is my way of using oee with this kind of sites
Oleg Chernavin 07/26/2011 04:14 pm
Yes, I understand. Please give me some examples of such URLs for further code improvements.

Oleg.