Want to "freeze" projects by year, and continue

Author Message
Pablo 11/09/2006 06:59 am
I am downloading about 100 sites daily, and that is about 50 Gigs at this moment. This information is being indexed daily with a text-indexer application.

This amount of information and indexing process is becoming too big, so this is what I would like to know: I want to "freeze" those 100 sites stored on a folder, say, named '2006', index it, and then let that information and index freezed. Also, I would create a new folder '2007' for the pages downloaded from now over, with a new index only for the '2007' folder.

The problem is that I think that if I create the new '2007' folder (empty), the next time OE executes, it will download most of the 'old' pages again, so quickly the '2007' folder will be as big as the '2006' folder.

Is it possible for OE to "check already downloaded files" on the '2006' folder, but download the new ones in the '2007' folder? How could I do this ??

Thank you very much for your help.
Pablo.
Oleg Chernavin 11/09/2006 07:23 am
Well, if two folders use different download directories, this is impossible. Maybe only to use URL Filters to filter out pages that contain older dates (if URLs of the site contain date in some way).

Best regards,
Oleg Chernavin
MP Staff
Pablo 11/09/2006 08:15 am
Thank you Oleg for your answer.

I'm very sorry that can't be done.

I think this may be very common problem: folders that become huge to manipulate, index, store, back-up, etc.
Do you think that it would be very difficult to have a parameter that tells OE to check for existing files on a folder, but download the new ones on a different folder? That would be a great solution !

Thank you again.
Pablo.


> Well, if two folders use different download directories, this is impossible. Maybe only to use URL Filters to filter out pages that contain older dates (if URLs of the site contain date in some way).
>
> Best regards,
> Oleg Chernavin
> MP Staff
Oleg Chernavin 11/09/2006 08:32 am
I agree, but it should be setup somehow and too many settings make operating Offline Explorer harder. I am afraid, this option will be too rare in use.

Oleg.
Pablo 04/29/2007 03:07 pm
Dear Oleg; I'm still trying to find a way to solve this problem. My directories are now 80 Gigs in size, and that's too difficult to maintain, backup, etc.

Let me ask a different question: what happens if I move all my downloaded directories, to a new location, completely deleting them from the place where OE downloads now. If now I execute OE will it download again most of that pages, or will use the information stored in \Documents and Settings\... to skip the already existing files ?
In different words: do OE maintain a "map" of all the files it has already downloaded from each site ?

I need to find a way to move those 80 Gigs to a different location, but I don't want OE to start downloading them again...!

Thank you very much for your help, and an excellent product.
Oleg Chernavin 04/30/2007 04:34 am
If you want simply to change the download location, then select the folder in Offline Explorer, click the Properties button and change the Download Directory setting. Click the OK button. You will be prompted to move downloaded Projects to the new location.

However if you need to move previously downloaded files to another drive and download updated files to the previous directory, then Offline Explorer is unable to use files on the other drive to skip duplicates.

Oleg.
Pablo 05/21/2007 02:21 pm
> If you want simply to change the download location, then select the folder in Offline Explorer, click the Properties button and change the Download Directory setting. Click the OK button. You will be prompted to move downloaded Projects to the new location.
>
> However if you need to move previously downloaded files to another drive and download updated files to the previous directory, then Offline Explorer is unable to use files on the other drive to skip duplicates.
>

Dear Oleg; sorry to bother again with this issue, but I'm still trying to find the most simple solution for this problem. I think that, sooner or later, many OE user will need the ability to "freeze" and archive projects because they have grown just too big to move, index, and process.

This is "the best" solution I have found yet:

Create a new Url Field Parameter "CheckExistentAlsoOn=", followed of the path to the "project archive".
Like this: CheckExistentAlsoOn = d:\OldArchive\2006\www.domainname.com\

The way it should work is like this: when OE checks if a file exists, it should check BOTH at the usual projec'ts location, and ALSO at the path pointed to by the parameter CheckExistentAlsoOn=

That's it.
This way, you can move huge amount of files to a "project archive", with a permanent, non-changing index, and the project will continue downloading at the usual folder, but checking for duplicates also in the project archive.
Also, it is very easy from time to time to move the "new downloaded" files, to the "project archive".

I hope this could be implemented... it doesn't seem too difficult... I hope.
Thank you,
Pablo.
Oleg Chernavin 05/21/2007 02:44 pm
Yes, it is one solution. The problem is that Projects are not implemented to work with their own download directory - they depend on the Folder settings. It will be hard to implement this support. Also, maybe it will be even better to add a feature to archive or backup a Project and to look for the files in the backup archive.

Oleg.
Pablo 05/22/2007 12:41 pm
> Yes, it is one solution. The problem is that Projects are not implemented to work with their own download directory - they depend on the Folder settings. It will be hard to implement this support. Also, maybe it will be even better to add a feature to archive or backup a Project and to look for the files in the backup archive.
> Oleg.

I don't understand completely your point. In my proposal, OE should work just as it works now. The only addition is that when checking for the existency of files, it will check BOTH in the folder it checks now, and the folder indicated by CheckExistentAlsoOn=

It is the responsability of the system administrator to set the correct path in the CheckExistentAlsoOn= parameter.

Hope it helps...
I need this...

Oleg Chernavin 05/24/2007 03:23 pm
OK. I implemented this. The updated oe.exe file is here:

http://www.metaproducts.com/download/betas/OEP2606.ZIP

You will have to use the following URLs field command:

OtherDownloadDir=e:\directory\

Oleg.
Pablo 05/24/2007 05:57 pm
Oleg, that's fantastic !!
Unfortunately, I will not be able to test it until next monday...
Thank you very much for your great service and great product.
I will let you know the result of my tests...
Pablo.


> OK. I implemented this. The updated oe.exe file is here:
>
> http://www.metaproducts.com/download/betas/OEP2606.ZIP
>
> You will have to use the following URLs field command:
>
> OtherDownloadDir=e:\directory\
>
> Oleg.
Oleg Chernavin 05/25/2007 04:55 am
Thank you!

Oleg.
Pablo 05/28/2007 07:43 am
Dear Oleg,
Today I will start testing the new build 2602 I have just downloaded.

I want to confirm how to use the new OtherDownloadDir= parameter. I have two questions:

1) If I have a project currently downloading to D:\OE\PROJECTS\SITE\, and move it completely to an "old project archive" in D:\BACK.UP\PROJECTS\SITE\, then the right command would be:
OtherDownloadDir=D:\BACK.UP\PROJECTS\SITE\ or
OtherDownloadDir=D:\BACK.UP\PROJECTS\ ??

2) I'm using the advanced options feature (checkbox) "prevent overloading Windows filesystem", so OE splits large folders with over 1000 files into smaller ones. Does the new OtherDownloadDir= keep this in mind? Old files might be in those subfolders named %&OvrX ...

Thank you,
Pablo.

Oleg Chernavin 05/28/2007 07:51 am
1. If your site on the alternate backup location is d:\backup\oe\www.site.com\... then you should use:

OtherDownloadDir=d:\backup\oe\

2. I made a quick implementation of this feature and it doesn't support overloaded directories yet.

Oleg.
Pablo 05/28/2007 08:48 am
> 1. If your site on the alternate backup location is d:\backup\oe\www.site.com\... then you should use:
>
> OtherDownloadDir=d:\backup\oe\
>
> 2. I made a quick implementation of this feature and it doesn't support overloaded directories yet.
>
> Oleg.


Mmmm, I think that will have to wait until you have time to implement the checking of existing files in the \%&OvrX folders, because ALL my projects are big enough that they are stored in many \%&OvrX folders.

If I execute now, I will download zillons of duplicate files.

Thank you very much,
Pablo.
Pablo 06/04/2007 07:43 am
Hello Oleg,

Any chance to finish the "OtherDownloadDir=" command with support for overloaded directories ??

thank you,
Pablo.
Oleg Chernavin 06/05/2007 12:51 pm
OK. Here is the update that should work with them. Sorry that it took much time to add.

http://www.metaproducts.com/download/betas/OEP2612.ZIP

Oleg.
Pablo 06/07/2007 12:41 pm
> OK. Here is the update that should work with them. Sorry that it took much time to add.
>
> http://www.metaproducts.com/download/betas/OEP2612.ZIP
>
> Oleg.

GREAT ! I will start testing the new feature.
Thank you !
Pablo 06/07/2007 09:38 pm
Dear Oleg,

I have started to test the new OtherDownloadDir=, with some strange results:

A- In one of my sites, to test OtherDownloadDir=, I have moved all the files to a backup folder. Then started again the download, and... bingo! no new file was downloaded, since OEP checked for its existency in the backup folder, so there was nothing to download.

B- However, on several other projects, after moving its data and refreshing the project ... everything is downloaded again.

To search for the causes of this behaviour, I tryed to "refresh" these projects (B) without the OtherDownloadDir= command, and found that these projects download existing files once and again. So the problem is that it is not working properly the checking for existing files.
Since most of my projects are "news style projects" the settings I use is "Exclude existing files on levels over 0". This should mean that ANY EXISTING FILE will not be downloaded again, right ?

What might be the problem?
I think that the problem is not on the OtherDownloadDir= command, but on the checking for existing files. This is something I noted months ago, but didn't give it enough attention.

My theory is that when you check "Exclude existing files on levels over 0", OEP does "something else" than checking for the existence of the file; it is checking for the date or size or something...that causes the file to be downloaded again.

Any help?

Thank you very much,
Pablo.
Oleg Chernavin 06/08/2007 01:15 pm
OK. Could you post such Project here, so I could reproduce? Please select it, click the Copy button on toolbar and then paste to the forum message.

Oleg.
Pablo 06/08/2007 02:41 pm
Oleg,

I think that I have found something:

If I update a project (Shift-F5), OEP does not keep in mind existing files, and downloads everything (or most) of the files again, not keeping in mind files existing neither in the current download folder, nor the OtherDownloadDir= folder. I think this is NOT correct.

If I update a project with F5, seems that OEP does keep in mind both existing files in the current download folder, and also in the "backup folder" (OtherDownloadDir=). This IS correct.

Hope this helps.
Regards,
Pablo.


The project's info is the following:

[Object]
OEVersion=Pro 4.7.0.2612
Type=0
IID=2
Caption=Punto a Punto
URL=http://www.puntoapunto.com.ar/default.aspAdditional=AddOriginalURLChannelsPerServer=10OtherDownloadDir=D:\FI1.BU\OE\AR\
Lev=3
Hour=21
Minute=30
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=3
LTMethod=1
pswMethod=2
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdf
FTUDef.Exts=classcssdtdjsssiswfvbsxsl xxxxxoxx
FTText.B=ooxooo
FTImages.B=xoxoxo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,1,3,3,0,1,0
RSrvsBx=1
RPathBx=2
RPathEx=agendadestacado_diariosregistroedicionesframesincadmanager xxxxxxx
RFileBx=2
RFileEx=recomendarnota.aspopinion.aspbuscaimg.aspcumpleagno.asp xxxx
RProtBx=2
RProt=1
LastStart=77:14:218:213:52:41:227:64:
LastEnd=108:45:106:219:52:41:227:64:
S200=24
S304=2632
SPar=2656
SSav=24
SLast=304
SSiz=738510
SMdf=24
LFiles=2656
LSize=738510
CopiesFmt=2
CopiesDate=YYYYMMDDHHNN
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.puntoapunto.com.ar/default.asp
IPAddr=-1918244151
Oleg Chernavin 06/11/2007 01:31 pm
Well, this looks as a correct behavior. When you press Shift- or Ctrl- or Alt-F5, it ignores the File Modification check settings in the Project Properties and uses the way you chose with the keystroke. And it looks like the server always reports these files as changed or new, so updating the site causes its redownload all the time.

Oleg.
Pablo 06/11/2007 06:23 pm
OK, understood. Then if I work with a "backup folder" and the OtherDownloadDir= command, I will have to avoid by all means the use of Shift-F5, Ctrl-F5 and Alt-F5, and only use F5, or a programmed schedule of downloads. Otherwise, I will get my folders filled with duplicates... am I correct?
I think that if a project includes the OtherDownloadDir=, OEP should prevent any action that could arise in duplicates...don't you think?

Now let me ask a different question about this: I have read that OE Enterprise creates a compact database of all downloaded files, and this database is used to prevent downloading duplicates. What if instead of using OEP, I use OEE... I could delete all the project's files periodically, moving them to the backup destination folder manually, and OEE will still download only new and modified files, correct?

Thank you for your help!
Pablo.





> Well, this looks as a correct behavior. When you press Shift- or Ctrl- or Alt-F5, it ignores the File Modification check settings in the Project Properties and uses the way you chose with the keystroke. And it looks like the server always reports these files as changed or new, so updating the site causes its redownload all the time.
>
> Oleg.
Oleg Chernavin 06/12/2007 04:16 pm
Well, it would work about the same way as with the backup directory. If the site doesn't support providing the correct file update information, Offline Explorer will download the same files again, because it thinks they are new.

Oleg.
Kuro6 08/31/2009 03:05 am

Also want add now that

DeleteAfterParsing=*.html,*.htm,*.asp

when put into the box just deletes everything (.jpg''s, .zip''s, etc.) after download before it can be moved to the new folder.
Oleg Chernavin 09/06/2009 04:49 am
Can you give me the Project settings? Select it, use Export - Project Settings - Copy and paste to the forum message.

Oleg.
Kuro6 09/06/2009 01:06 pm
> Can you give me the Project settings? Select it, use Export - Project Settings - Copy and paste to the forum message.
>
> Oleg.

Oh, and just wanted to add that archive files like zip and rar always download, even if they are in the original download folder. From what I can see, they seem to download as a .php file, and are then are renamed by OE after download. It appears the program looks for the .php file when it is comparing and not the zip file. So they always download no matter what. Big problem when you update download content often.
Kuro6 09/06/2009 01:10 pm
> > Can you give me the Project settings? Select it, use Export - Project Settings - Copy and paste to the forum message.
> >
> > Oleg.
>
> Oh, and just wanted to add that archive files like zip and rar always download, even if they are in the original download folder. From what I can see, they seem to download as a .php file, and are then are renamed by OE after download. It appears the program looks for the .php file when it is comparing and not the zip file. So they always download no matter what. Big problem when you update download content often.

Gah, no edit post button. Also wanted to say that I am using OE Enterprise 5.6, not pro. Its just that I found this message in the OE pro forum, and thought it was a good place to post and continue the discussion.
Oleg Chernavin 09/07/2009 09:00 am
I need to reproduce this issue in order to make the improvement. Can you give me access to that site? You may send the details to support@metaproducts.com

Oleg.
Kuro6 09/07/2009 02:18 pm
> I need to reproduce this issue in order to make the improvement. Can you give me access to that site? You may send the details to support@metaproducts.com
>
> Oleg.

Hey Oleg, sent you an email with the login information.
Oleg Chernavin 09/07/2009 03:11 pm
OK. If you don''t get reply from me tomorrow, post here again, please.

Oleg.
Kuro6 09/07/2009 05:04 pm
> OK. If you don''''t get reply from me tomorrow, post here again, please.
>
> Oleg.

Thanks, Oleg.
Oleg Chernavin 09/11/2009 05:49 am
I haven''t received E-mail from you. I wrote you directly myself.

Oleg.
Kuro6 09/11/2009 02:21 pm
> I haven''''t received E-mail from you. I wrote you directly myself.
>
> Oleg.

Sorry, my Gmail account seems to be fried. I can''t access it at all. Could you please send email to Logan9773@hotmail.com.
Oleg Chernavin 09/14/2009 06:44 am
OK. I sent it.

Oleg.