removing downloaded files from the project dir

Endless 08/17/2004 04:52 am
I`m doing a VERY large site rip and I was wondering if it was OK to remove some of the already downloaded files from the project directory. I`ve been downloading a few gig a day and suspending to a file every day, and then cleaning out the project directory of extra uneeded files every so often. Every time I resume from file, the # of queued files seems to be growing too high and I`m starting to get dupes. Does OE keep track of what it has already downloaded? or does it look in the project dir and use that to keep track? If needed I can stop moving the files out, I have the HD space to hold them, but I reeeally don`t want to have to end up redownloading the files I`ve moved.

Oleg Chernavin 08/17/2004 05:36 am
If you do the "Suspend to file", Offline Explorer does save the info about files it already downloaded. You may notice the binary .wdqh file - it keeps that list.

Endless 08/17/2004 02:37 pm
That`s good to know. One day it said there were 15,000 files in queue, the next it said 17,000, and now its up to 19,000. The more I dwled the more files ended up in queue. I started to get real worried that OE was thinking because the files weren`t there anymore, that it haden`t dwled them. With filtering I expected only 10,000 or so files, seems I was way wrong and there are about twice that many. 2,700 files dwled, and that`s 30 gigs. I`m a lil freaked now how much the next 19,000 will add up to :-)

Oleg Chernavin 08/18/2004 01:39 am
Wow! Do you mean that each file size is 10 megabytes?

Endless 08/19/2004 01:43 am
> Wow! Do you mean that each file size is 10 megabytes?
they range from 4 meg to 23 meg. But 90% of them are 10-12 meg avi`s. There are thousands of zip files and who knows how many pictures too. But I`m filtering them out to hopefully only get avi`s. The csv`s I have, have given me the impression of only about 10,000 avi files, but I`ve already downloaded that many and have almost twice as many in the queue still. I`m not sure what`s going on, looking inside the queue file, they`re all avi`s. Guess I`ll have to download them all to find out :-)

Oleg Chernavin 08/19/2004 04:12 am
Well, you can see all the files that are going to be downloaded using the Queue tab. You can easily abort unwanted URLs there.

Endless 08/19/2004 09:19 pm
There`s a queue tab!?!?!? Where the heck is that sucker? I`m using OE 3.3.1757

Oleg Chernavin 08/20/2004 02:29 am
It is in the Pro or Enterprise versions. I am sorry - I haven`t thought that you are using the standard version.

Endless 08/20/2004 03:09 am
Darn. It would be handy to see what`s queued in a slightly more organized fashion than the queue file shows it. Oh well. no biggie.

I do wonder a little bit, I`m getting the error window filling up with "Error creating file: Error code=00000003 Attempts=0"

I`m assuming thats because OE had managed to crash on me and I had to resume again from the same suspend file that it had crashed on before, and it`s trying to write files it had already downloaded before it crashed.

So what is error code 00000003 ? (enough zeros in that error code btw?)

Oleg Chernavin 08/20/2004 03:41 am
Are there many files in that folder? If there are several thousand files, then the folder appears to be overloaded. Windows cannot hold too many files in one folder. There is a limit which depends on the filename lengths.

I would suggest you to upgrade to the Pro version of Offline Explorer, which has the "Prevent directories from overloading" feature in the Options dialog | File Locations section. It will help you with downloading this kind of a site.

Endless 08/21/2004 01:11 am
It was about 11,000 files, it`s now it`s about 5,000 after I deleted 6,000 numbered and no extension files. All thats left are maybe 5000 Descr.WD3 files. Do I really need those still? What about empty folders? How bout this, make it easy, is there ANYTHING in the project directory that would still be needed for OE to not download dupes or already dwled files?

Oleg Chernavin 08/21/2004 04:54 pm
Yes, the descr.wd3 files are necessary when you are about to update the site. They keep some important info, like modification dates of the files and their actual contents type (MIME), etc. So, to resume the download, you need to have the downloaded files and descr.wd3 files.

Endless 08/23/2004 05:20 am
And if I don`t ever plan on updating? I`ve been resuming to file every day and suspending when i`m done. by the time i`ve done the whole site my login will expire.

Oleg Chernavin 08/23/2004 05:25 am
If you don`t plan to update the site, you can safely remove all these directories yourself. Resume from file doesn`t require any of these files.

Endless 08/24/2004 01:15 am
Goodie Goodie Goodie. There`s a couple thousand or so files I can toss out.

Oleg Chernavin 08/24/2004 09:39 am
Endless 08/29/2004 06:29 am
I deleted all of the wd3 files and empty directories that were left behind. And now, maybe coinicidentally, I am starting to download a ton more dupes than before. 960 files out of 1,200 were dupes, blowing about 10+ gig of downloads. If I`ve been suspending to and resuming from a file, how could this be?

I`m down to only needing a few hundred more files to be complete, how many entries can i put in the url or file filters? Would 100 entries be too many? A cool feature would be the ability to use csv files to include or exclude files or directories to download.

Off to bed for me, I figured I`d ask about the url filter limitations and hope for an answer when I wake up :-)

Endless 08/29/2004 06:53 am
SWEEEEEET!!! That help file is actually useful :-)

CTRL+L and I loaded up a textfile I had made and got the 97 entries I needed into the inlcude list of the url filters.

Will have to check tommorow to see how many dupes I`ve got. Hopefully very few or none.

Oleg Chernavin 08/29/2004 02:26 pm
Yes, Ctrl-L is helpful. There is no limit on the number of enries in URL Filters / File Filters. However parsing files will get slower if you have too many items there. Because each link which OE extracts from file will be checked against the filters and all their rules first.