You still haven't fixed the crash even in v6.2..
|Eros||02/29/2012 05:13 am|
|No it was not the script parser... or maybe it was, and you only removed one cause out of unknown number of causes...
I was spidering anime.desktopnexus.com, downloading all Images and Video only, including SWF, since it is a video too.
OE ran for 5h, with 9 connections and then crashed, BOOM. The same crash method, that was present since v6.0.
Since I still have 20+ jobs to run, do you have any crash analyzing tool? I saved 2 appcompat logs and made 2 screenshots, but I think that may not be enough or not even anything near useful... The crash seems to be in 90% same - kernel32.dll related. And sometimes it's about "thread" something...
I can tell you this much, that I couldn't find a pattern in downloaded size and RAM consumption, so it may still be caused by some parsing module...
I think you'll get the crash yourself, when you try to spider anime.desktopnexus.com...
If I get more crashes, should I report those websites, maybe you'll find a pattern in common in all of those sites?
In worst case scenario.. this might even be hardware related..
Also, what's custom Agent identification's max length? I can't use any of my browser's real, valid IDs, it will make OE do a bad job...
For example, if I input:
Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
and start a job, OE will go nuts and finish a job too early, without any good results, while using the default IDs allows OE to run until it... crashes. I haven't never yet managed to see OE to actually complete a job.. it only crashes..
|Oleg Chernavin||02/29/2012 05:13 am|
|I have suspects about memory fragmentation. I changed the memory manager. Can you please update oe.exe file and make the test again?
|Eros||03/02/2012 03:19 am|
|Finally! I think..
Okay, I ran the same job, 25h, no crashes yet!
I'll do 2 more jobs. If it survives 2x24h more, then it's 99.9% fixed (- because my real goal is still actually 2 weeks in a row for 1 job, but I just don't have any jobs like that in March. Maybe in the next month I'll be able to finally do the ultimate spider test...)
But in the meanwhile, I found something else, interesting...
Did you take a look in "Agent identification" module? Since it seems to be defective too.
I was using my own Firefox v10's identity and I started a spidering job, but it was so short - OE quickly ran out of pages to spider.
I then had a thought, that I should try the default Identify as "FireFox 5.0" identity and voila, OE successfully spidered my target 'til completion. What's even worse, the target website doesn't even require any specific browser!
This has happened to me twice already. First time I was using my IE8's identity.
It doesn't seem to depend on text length, it just dislikes all modification to the current default preset or custom input in "Use this identification" box. Weird. I'll run few more jobs.
Also, the "Downloaded" info (on the Connections Panel) is also incorrect, or is it supposed to show such numbers for "bragging"?
Since I'm using filters and downloading only specific content, the actual download folder's size is smaller and there are less downloaded files inside than displayed after "Downloaded".
|Oleg Chernavin||03/02/2012 01:02 pm|
|This is great news about stability! Thank you!
Identification - can you please give me details on how to reproduce this? For example, site URL, Level, other importantt settings. Agent 1 gives certain amount of pages, Agent 2, another amount, etc.
I agree about the Downloaded - I also faced this several times. I will work to understand why it happens and will try to fix.
|Eros||03/04/2012 11:36 pm|
|"Use this identification" doesn't like Firefox 10's identity.
I set a custom identification there,
My Firefox 10's identity:
Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
clicked on Apply, closed the Options window,
but every time I check back there, that setting, "Identify as" radio button is still selected.
Also the "Identify as" box under Internet tab had the previous value of my last default preset selection.
That doesn't seem logical.
So, I selected a default preset under "Identify as" "Internet Explorer", then clicked on
"Use this identification" radio button and then OK.
This time OE accepted it and changed the value under under Internet tab to Internet Explorer?s identity info
and also, when I checked back at its settings, "Use this identification" radio button stayed selected.
Then I selected "FireFox 5.0" preset and clicked on "Use this identification" radio button, then OK. It didn't stay, went back to Identify as FireFox 5.0.
Then I inputted my Firefox's identity again and pressed on OK - OE then reverted that setting to "FireFox 5.0" preset.
You know, what's even more bizarre? I then tried with my IE 8's identity, which is:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; InfoPath.2; .NET4.0E)
and I tested also with my Google Chrome's identity:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11
OE accepted them both. So it only dislikes that Firefox's identity.
Besides IE and FF presets, I haven't checked other presets, they may be bugged too.
I still have yet to encounter the issue, where spider rapidly runs out of pages to crawl with a custom, non default preset's identity set.
(Maybe the identity code has something that tells OE to download the mentioned URLs only and skip further spidering, when the website is built with a specific code?)
I can't foresee the future, but I keep on running jobs. I may not even encounter this anytime soon, but I will restart my spider list after I finish current list and if it's still not fixed in next OE version, then I may definitely encounter that bug again.
And congrats, after 2 jobs I haven't got a single crash!
I think I don't need HTTrack anymore...
But, I still have a lot of jobs to run this month and maybe in next month I can run jobs longer than 24h, then I can finally do the ultimate survival test with OE.
RAM consumption... will OE survive 2 weeks without running out of memory? Is the memory management module built intelligently enough to survive?
|Oleg Chernavin||03/05/2012 08:14 am|
|Yes, I found and fixed the custom user agent identification glitch. Thank you!
Regarding running the downloads during two weeks - I hope it will work well because of the updated memory manager. Please keep me informed on how it works!
|Eros||03/06/2012 12:43 am|
|I found another "glitch".
Options > Files > Disk free space limit
Why doesn't it allow any number higher than 10000?
For end-user's comfort, it should allow any number that is logical for currently selected hard disk and its condition.
|Oleg Chernavin||03/06/2012 03:57 am|
|Thank you! Fixed that.
|Eros||03/06/2012 12:36 pm|
|Sad news: the crash occurred again.
Target: kisuki.net + Load up to 2 links on other servers
Run time 'til crash: 3h
The same error messages about Kernel32.dll and EThread.
In this job I thought I change Link Translation setting from No Translation to Offline Translation - that may have been the cause. I'm rerunning the job with No Translation this time to see if it survives 'til morning.
Btw, I noticed that even if the crash occurs, OE keeps on spidering and only terminates, when I Close the message, which says that OE wants to terminate. If I ignore that message, then my screen gets a new set of same error messages every time it steps on a "mine" or maybe it still has something to do with reaching memory high peak.
|Eros||03/07/2012 01:19 am|
Crashed again. The crash doesn't depend on Link Translation. Kisuki.net or a site it links to has something specifically that makes OE crash. It crashed again after 3h, so the "mine" should be close.
|Oleg Chernavin||03/07/2012 04:57 am|
|Can you please watch memory usage on the System Monitor tab? Perhaps, it uses 2 GBs and this way it gets out of memory.
|Eros||03/07/2012 09:14 pm|
|I ran the same job for 3rd time. It seems to crash differently each time, but still crashed after the same amount of time, 3-4h.
This time. there was no crash message, but it hanged and didn't allow me to open it.
I let it run for a bit longer after hang start, thinking maybe it finishes whatever it was doing, but after 1h, it still hanged, and displayed no error messages.
Memory usage was 98MB and CPU usage at max at hang start (I used Process Hacker, a Task Manager replacement tool to view info), but after 1h, it stayed on 99.62MB with no CPU usage.
There were no active connections, while was hanging.
So it's not the fault of running out of memory, I thinks, but it may still be a logical processing "mine" somewhere at Kisuki's place..
|Eros||03/08/2012 12:39 am|
I get different crash results every time I do this job.
Same amount of time, hanged, no error messages, but this time, OE ate up to 1.84GB of memory, then at 1.85GB it hanged.
|Eros||03/10/2012 03:04 am|
|Here's another.. annoyance:
For this, OE's window needs to be Maximized and then closed to tray.
Then whenever I start a full screen app, e.g. a video game, OE pops up, minimizing the full screen app window.
Then click on X to close OE back to tray and then try opening the full screen app again, OE pops up again instead of the full screen app.
When it has done so, OE's Minimize button doesn't function, you can then either close it back to tray or let it stay open.
This doesn't happen, when OE is not in Maximized mode.
It seems to have something to do with full screen app changing current screen resolution.
|Oleg Chernavin||03/12/2012 12:36 pm|
|Does your project has "Index loaded files for faster search" box checked? If yes, uncheck it and see if it would prevent from hangup.
I will test the full-screen issue to find out what's wrong. Thank you!
|Eros||03/12/2012 04:35 pm|
|No, I have never had that option turned on.|
|Oleg Chernavin||03/13/2012 10:44 am|
|OK. Can you try another thing - change the project to allow loading only from the starting server. And no external links. Would it hang?
|Eros||03/16/2012 01:31 pm|
|Yet another crash occurred at boyis.com! Grabbing SWF files only.
No idea if it's any help, but...
VS2010's Debugger says:
Unhandled exception at 0x7c8024f0 (kernel32.dll) in OE.exe: 0xC0000005: Access violation writing location 0x00040e80.
Call stack location:
kernel32.dll!__SEH_prolog() + 0x1a bytes
OE's RAM consumption: 1.75GB
CPU usage at max while crash.
I've read about 0xC0000005 error code..
Bad RAM? No.
Trouble in registry? Maybe, don't know where to search and how to fix anyway.
Incompattibility with DEP? OE, always triggers DEP, when OE is not in DEP's exclusion list.
Not using SafeSEH? ... .
I've also encountered Runtime error 204 at 079E24D8 several times, when starting the job on previous SWF sites, but then I could just set OE to Download again and that message wouldn't appear again.
But sadly I still have no idea, how to reproduce either. Unless you do the same job, with my settings and just wait. I hope this is not hardware specific problem...
|Eros||03/16/2012 02:45 pm|
|Nope, ran 24h fine without any crashes or hangs.
So it's not Kisuki, but an external site it directs to or a site the external site directs to.
But it can't be far, since the hang or crash happens always after 3rd hour and before 4th hour on my 12Mbit internet.
It could be easier to identify any bugs, if your program had special bug analyzing module. I'd be glad to help you in that advanced way.
|Oleg Chernavin||03/16/2012 02:46 pm|
|Maybe it is really depends on the Flash files processing. Does it happen fast with the SWF grabbing project or also after several hours?
You can post the Project settings here - select it, press Ctrl+C and paste it to the forum message.
|Eros||03/17/2012 02:23 pm|
|No, last 2 had crashes after 8h or so...
Okay, on my last SWF project, I got a brand new error message:
"Not enough storage is available to process this command."
...and then came the crash message.
"Not enough storage..."? If it's about RAM, I forgot check the RAM consumption, so I can't tell, how much it ate...
But this couldn't be hard disk space, because I have over 180GB of free space.
I checked my Event Viewer...
Faulting application oe.exe, version 220.127.116.1134, faulting module kernel32.dll, version 5.1.2600.5781, fault address 0x00012afb. - fault address was the same on my 2 last projects - running out of handles?
Anyway, Boyis.com settings - Settings I use to grab SWF files:
|Oleg Chernavin||03/17/2012 02:24 pm|
|Can you try to change the Project Properties - Parsing - uncheck the Process Flash Video and Process Complex Scripts boxes? Would it make the download stable?
|Oleg Chernavin||03/17/2012 02:25 pm|
|Also, here is the latest version:
|Eros||03/19/2012 12:51 am|
|Good news! sort of...
I did a series of tests and indeed, the culprit at Kisuki was "Explore Flash video links" - if that option is off, then hang/crash won't occur!
I will now test Boyis to see if it survives 24h without that option.
|Eros||03/19/2012 06:21 am|
|So the flash parser is not the only faulty one...
At Boyis, with the flash parser turned off, I now get 2x "Runtime error 204 at 070324D8" and then hang.
I will do a final test without the Evaluate script calculations to see if that will allow this job to complete with success.
|Eros||03/21/2012 03:29 pm|
|Ran Boyis without flash parsing and eval scripts, downloaded 12GB, RAM usage 171MB, ran for 18 hours without crashing or hanging until...
I clicked on Stop
Thread Error: The handle is invalid (6)
Clicked on OK, then on Stop again, the same message appeared.
Impossible to Resume/Start the job - OE's spidering engine hanged.
So I closed the program and I got that message again, but OE closed successfully tho.
So, both flash parser and eval scripts are defective, and not only those, there's also the 3rd thing and maybe even more.
All errors seem to have something in common: thread and handle error.
I really wish I knew, how to debug errors...
|Oleg Chernavin||03/22/2012 05:16 am|
|Errors of this kind are very hard to debug and fix. I will keep working on them to find out the reason.
Thank you very much for the tests! I will verify the script and video parsing code again.