Save parsing to file / slow parsing

Author Message
Blaine Dally 03/29/2007 07:47 pm



Dear MetaProducts or who it may concern,


I'm trying to update and complete (not downloading existing files) my copy of a large website (50+ GB). To do this OE must parse hundreds of thousands of files. Unfortunately this is a very slow process because I think this site uses complex php scripting and contains more than a million links. I'm unsing the latest version of OE Enterprise and I've tried all the suggestions made on this forum to speed up parsing with no noticeable effect. I must use "evaluate script calculations" or the website will not download completely. I'm sure my hardware and OS are functioning correctly. I note while parsing OE utilizes anywhere between 2% and 75% of available CPU with no other program running. So this would lead me to believe the slow-down is coming from disk access. I've verified there is nothing wrong with my standard IDE drive's read or write access. My guess would be windows and or OE does not like the long directory depth and number of files. I am using "prevent download directories from overloading" but as you may know even with this activated it still can be slow to access large number of files while using windows explorer, other program and I would assume OE as well.
At the speed at which is parsing progressing, I'm guessing it would (without crashes) take at least a week to complete before I could suspend the project's download queue to a file. Unfortunately I am unable to dedicate a system solely to this task and must use it for other things as well. And as anyone who does many things on a windows system (even XP) knows crashes/lockups are still frequent. I'm averaging one about every three days. And since there is no way to save my parsing progress to a file I have to start over again. At this rate I'm probably looking at several weeks (if ever) just to complete the parsing on this project.
As I understand from reading this forum, a feature to save parsing or "suspend to file" as not yet been implemented for OE. So I'm asking if there is perhaps some hidden or untested method of trying this. If not, I think you'd make many happy customers by implementing it.

This problem aside, I've tested many offline browsers and can say with certainty that OE is the best available. Support is outstanding as well. Thank You and please keep up the good work!

Sincerely,
Blaine Dally
Oleg Chernavin 03/30/2007 05:56 am
We profiled the parsing speed and with the scripts enabled, it gives the most of the slowdown. Perhaps, it could be improved by adding more parsing threads, if your system is dual or quad-CPU one. Please let me know how many CPUs do you have.

Best regards,
Oleg Chernavin
MP Staff
Kim 11/26/2007 08:42 pm
> We profiled the parsing speed and with the scripts enabled, it gives the most of the slowdown. Perhaps, it could be improved by adding more parsing threads, if your system is dual or quad-CPU one. Please let me know how many CPUs do you have.
>
> Best regards,
> Oleg Chernavin
> MP Staff

I also agree with adding more parsing threads. My parsing process is slower than the downloading process. So the parsing queue will be more than hundres of thousands while the downloading queue just doesn't have enough files to download. Too many parsing queue files will some times kill the OEE.
I suggest you to find a solution to increase parsing speed.
Oleg Chernavin 11/27/2007 10:22 am
Yes, this is in our plans for Offline Explorer Enterprise 5.0.

Oleg.