OE pro hangs on parsing

Michael Welsch
03/29/2006 07:02 am
Hi,

I`ve tried the current version (2.8 SR2) but I`ve got the same problem: after downloading some files OE hangs using much CPU-res for a while and the "parsing-section" in statusbar is showing 1 or more. I`ve got the same problems with SR1 ... so, what can I do?

Thanks for all

Michael
Oleg Chernavin
03/29/2006 07:02 am
Michael,

Does it completely hang or simply parses files and then starts loading other links? And how long does it take?

Thank you!

Best regards,
Oleg Chernavin
MetaProducts corp.
Michael Welsch
03/29/2006 07:02 am
> Does it completely hang or simply parses files and then starts
> loading other links? And how long does it take?

First of all, an example url the error is occuring is www.zeitzuleben.de.
After about 210 files OE hangs on parsing a file (I cannot see wich one). This means: the application does not reply any key or mouse events and no further file download is starting (perhaps invisible?).

After a while (a few seconds to several minutes) the process goes on. Some files later the same problem occures again.

I`ve already tried to switch off all parsing options without success. I also tried to reinstall (uninstall and install again) OE.

If I do remember right this problem occurs first after downloading version 2.8 SR1. (The same in current version 2.8 SR2 as mentioned).

Hope this will help you to solve my problem.

Thanks again!
Oleg Chernavin
03/29/2006 07:02 am
Can I ask you to turn on logging to see the actual file? Please press Ctrl-W on your keyboard, then right-click the Log Window, uncheck Pause to enable logging and keep only Queue Messages checked under the Filters menu. Please start the download and watch for the Queue: Parsing ... messages - this will give you exact link which is being parsed. Please let me know on which file it stops for such long time.

Thank you!

Oleg.
03/29/2006 07:02 am
> Can I ask you to turn on logging to see the actual file? ...

the first file is: http://www.zeitzuleben.de/serv/shop/org/index.html

thanks
Oleg Chernavin
03/29/2006 07:02 am
This is strange. When I load that file I experience no delay at all. I will keep working on this problem. I am developing a new queue code now, so I will consider certain performance improvements there, like multithreaded queue.

Thank you.

Oleg.
Scotty
03/29/2006 07:02 am
> This is strange. When I load that file I experience no delay at all. I will keep working on this problem. I am developing a new queue code now, so I will consider certain performance improvements there, like multithreaded queue.

I`m having a similar problem with OEP 2.9 b1280. The URLs are as follows:

http://www.ladiscusion.cl/diario/
Referrer=http://www.ladiscusion.cl/diario/
http://www.diarioladiscusion.cl/diario/calendario/archivo.php
http://www.diarioladiscusion.cl/diario/calendario/archivo.php?calstyle=1&mo={:1..9}&yr=2003
http://www.diarioladiscusion.cl/diario/calendario/archivo.php?calstyle=1&mo={:10..12}&yr=2002
http://www.diarioladiscusion.cl/diario/?col=0&control=cuno&id_noticia_p={:18952..83000}

After a minute or so, the following happen:

(1) OEP starts downloading in "spurts" -- it will download some 10 files (I have the program to allow 10 connections), and then spend a minute or two parsing. Then another spurt of downloading, and then another minute or two of parsing, during which NO downloading at all goes on (this is graphically clear on a bandwith monitor program I use). OEP uses only about 25% of my bandwith when it actually does some downloading.

(2) If I hit F9 to suspend the project, the parsing goes on, taking 5-10 seconds to parse each file.

(3) If I try to suspend to a file, OEP freezes and I have to kill its process to shut it down. I`ve waited up to 2 hours for the program to respond, but it`s crashed.

(4) The program is **extremely** unresponsive when downloading. Sometimes it takes 30 seconds for it to come up when selected with ALT-TAB or the taskbar.

My computer isn`t exactly new, but it has 2 PIII 500 MHz CPUs and 512 MB of RAM. And more importantly, it doesn`t behave this way with all sites.

Could this have to do with the URL macros?
Scotty
03/29/2006 07:02 am
Now that I`m paying more attention to this issue, it seems that OEP is having the parsing problem I described in my previous post with many sites -- all of the sites I`ve tried to D/L so far, in fact.

I installed the lastest version yesterday (see previous post for program and system details), and that was when the problems started. Before that, I was using build 1258 (I think), and never saw anything like this.

Cheers.
Scotty
03/29/2006 07:02 am
Don`t mean to hijack this thread... I just figure that the more information, the merrier.

The project I`m currently downloading is having the same parsing problems. Here are the URLs I`m using:

http://www.eldiario.cl/
http://www.eldiario.cl/OpinionesForo.asp?idforo={:1..2760}
http://www.eldiario.cl/shnoti.asp?noticia={:1..31000}
http://www.eldiario.cl/template.asp?noticia={:1..60000}

I have the download suspended and OEP is parsing at the rate of about 1 URL every 2-3 seconds. To do so, it is using 50% of my CPU cycles, which, since this rig has 2 x PIII 500s, means that (a) it is using 500 MHz of processing power to do this, and (b) it`s only limiting itself to that because OEP is not SMP enabled.

When I start downloading (unsuspending the project), I get regular but very small spurts of downloads, averaging out to maybe 14 kbps on a 256 kbps DSL connection, with no other net activity, and also, OEP becomes horribly unresponsive -- so much so that when I try to switch to it from another program, it can take up to a minute (sometimes more!) to show.
Scotty
03/29/2006 07:02 am
An update on this problem:

This problem is occurring even with simple URLs such as this:

http://www.eldiario.cl/template.asp?noticia={:8023..60000}

And even when I have the level limit set to 0.

My average D/L speed is currently 70 kbps on a 256 kbps DSL connection, and OEP is still showing all the problems I mentioned above.

Oleg Chernavin
03/29/2006 07:02 am
I tried to reproduce that behavior with 2 of your URLs combinations. However on my computer parsing takes much less than a second and I didn`t notice any user interface slowdown. The download speed is about 80 kbytes per second (2 MBit Internet connection), so it looks like the maximum speed that server can output files at.

I have a 2.4 GHz P4 CPU. I have some thoughts on why user interface slows down. Currently parsing works in a separate thread, while adding files to the queue is being done in the main thread. I will try to improve this in 3.0 version.

Oleg.
ssieloff
03/29/2006 07:02 am
I have noticed a performance slow down also but it is associated whenever I use the "File Copies" "keep old copies" option. It appears to me that the system gets slower and slower as the downloads process -- I think it has to do with the fact that the program is renaming all the files in the directory and then saving the current as mypage.asp(1). Each subsequent download causes all the exisitng pages to be renumbered/renamed.

Perhaps the algorithm should just sequentially number the pages rather than renumbering them after each download -- so mypage.asp(1) would be the first page downloaded and mypage.asp(999) would be the most currently downloaded page. The current program runs opposite to this by renaming all downloaded files in the directory.

Just an idea ....

Steve


> I tried to reproduce that behavior with 2 of your URLs combinations. However on my computer parsing takes much less than a second and I didn`t notice any user interface slowdown. The download speed is about 80 kbytes per second (2 MBit Internet connection), so it looks like the maximum speed that server can output files at.
>
> I have a 2.4 GHz P4 CPU. I have some thoughts on why user interface slows down. Currently parsing works in a separate thread, while adding files to the queue is being done in the main thread. I will try to improve this in 3.0 version.
>
> Oleg.
Oleg Chernavin
03/29/2006 07:02 am
Yes, File Copies slow down the download process when it renames lots of files. However I was asked to make that kind of sequence where the less significant number means the newer file. To overcome this problem, please select "Use file Date/Time..." numbering. This avoids the renaming process.

Another thing that slows downloading is the directory overload protection - it constantly checks number of files in the directory when a file gets saved.

Oleg.
Scotty
03/29/2006 07:02 am
> I tried to reproduce that behavior with 2 of your URLs combinations. However on my computer parsing takes much less than a second and I didn`t notice any user interface slowdown. The download speed is about 80 kbytes per second (2 MBit Internet connection), so it looks like the maximum speed that server can output files at.
>
> I have a 2.4 GHz P4 CPU. I have some thoughts on why user interface slows down. Currently parsing works in a separate thread, while adding files to the queue is being done in the main thread. I will try to improve this in 3.0 version.

These problems don`t start immediately, but after a few hours of downloading. OEP slows down *immensely*, and all the symptoms I described above happen. I`ve reproduced this half a dozen times now. Also, OEP freezes solid when I try to suspend to a file while there is still parsing going on.

If you can, try leaving your machine on all night downloading the sites I mentioned, since this problem does not begin immediately.
Oleg Chernavin
03/29/2006 07:02 am
Thank you. I will work on it.

Oleg.
Trev
03/29/2006 07:02 am
> Thank you. I will work on it.
>
> Oleg.
Oleg, Just to add my bit. I have been trying to download from the Symantec website so that I have my own copy of virus definitions etc. http://securityresponse.symantec.com/ with 4 levels.
I keep getting hangups and my system resources report 100% CPU usage. I was using a Duron 1.3 with 512MB RAM. I have now upgraded to a Athlon XP2500 and still get the same. My OS is WinXP Homeand Ver 2.9.
Oleg Chernavin
03/29/2006 07:02 am
Do you mean temporary user interface hangups (which last for some time and then OE works normally) when you can`t press a button on toolbar, access menu, etc.?

Oleg.
Nic
03/29/2006 07:02 am
Can somebody else really add anything to this thread??? I`m not a real tecchy but...

I have noticed when I run OEP on an older machine (even with a DSL connection) that if I have too many connections (in most cases more than 3) then I get some similar symptoms. While it can look like OEP is hanging and is using high CPU (like all of it!), it does appear to be doing stuff - just not updating the screen. Having said that, when I re-download the site using less connections, it will pick up the odd page it missed the first time - so there is an intermittent problem. But when I reduce my connections - it kicks ass. Truly the best downloader I have every come across.

Just for your information - and maybe this isn`t truly related.

Nic.
Oleg Chernavin
03/29/2006 07:02 am
Thank you very much! It looks like writing loaded bytes to the disk and then file integrity checks take too much time comparing with the download speed. I plan to move that code to a separate thread, so user interface will not suffer. This will take some time though, so the best time estimate is 3.0 version for that.

Oleg.
Dion Wiggins
12/24/2010 10:50 am
This thread was from 2003, but we are experiencing the exact same symptoms today with the latest version.
Oleg Chernavin
12/24/2010 10:50 am
Can you please describe me how exactly this happens, on which sites, etc.?

Thank you!

Oleg.
Dion Wiggins
12/24/2010 11:35 am
We are using OE Enterprise.

Every website we crawl that is of significant size has this problem. This is not an exception, its the norm. Looking at the threads in the forums, it seems many people have experienced the same problems.

It starts, and then as the parsing count starts to climb and the queue count also starts to climb, it starts to happen. The UI locks up, it comes back occasionally. The downloads go in bursts of start and stop.

As an example, we have a job with around 400,000 pages downloaded already. It has 1 million pages in queue and 60,000 in the parsing queue. The OE.exe process is using 76,000K RAM. But CPU is almost 0. In fact its 0 most of the time, just occasionally coming back with bursts of 2 or 3 on the CPU
Oleg Chernavin
12/24/2010 12:14 pm
I received your E-mail as well. I will now prepare an updated version to test - it has singnificant changes and may help with the issues.

Oleg.