Parsing issues

Author Message
Coriolis 05/01/2006 02:37 pm
I am running OEP 4.2.2377 SR1 The parser is consistently hanging up on the URL http://www.furaffinity.net/art/ryanrabbat/1146493070.ryanrabbat_rab_lab.swf

Is there any way yet to download flash files without also parsing them? I tend to download packaged, offline flash files. If parsed these just generate online URLs that can never be found anyway.

Is there any way to set a maximum processing time for parsing an entry? This is more a feature request but a watchdog thread that checked for the parser freezing up or crashing (and subsequently restarted it) would be the bees knees. As it is when the parser hangs or dies there's no way to save the progress of the current downloads, since the save waits for the parsing queue to empty. This is especially problematic when the parser has died hours earlier and the parsing queue is up to six or seven thousand entries.
Oleg Chernavin 05/02/2006 07:45 am
Thank you! I fixed this issue. I hope to have the new version released this week.

Best regards,
Oleg Chernavin
MP Staff
Peter 05/25/2008 05:46 pm
Dear Sirs,

my large download project (~17.000 files, 1.1GB) is pretty much done: practically no further files are downloaded. Only the "Parsing" is still going on..

Question:
Can i simply stop (red "Stop"-button) the download project, i.e. this would interrupt the Parsing-process (currently at "Parsing (1934)"...coming down slowly from "Parsing (5123)"), and then continue my project, without problems, the next day? (radio button at "Download only Modified and new files", of course)

Or, what exactly happens if i interrupt a Parsing process (when the bulk of files seems to be downloaded yet), and then re-press the green "Download"-button? (Will then *all* files be re-parsed? And how does it affect the ''download of only modified and new files''?)

No doubt, i would be on the safe side by deleting the 1.1GB and run the project from scratch. this time without interrupting the (end) parsing process :-|

Thanks for some help, best regards
Peter

Btw, I love your product(s)! :))
Oleg Chernavin 05/26/2008 06:54 am
I would suggest to wait, because parsing means that some downloaded files were not yet processed to get links from them and change these links for offline browsing.

If you stop now, you can continue any time later by choosing "Do not load existing files" in the Project Properties dialog or by selecting the Project and pressing Ctrl+F5. However this will force Offline Explorer to go through all 17,000 files again to parse them.

Oleg.
Peter 05/26/2008 12:36 pm
> I would suggest to wait, because parsing means that some downloaded files were not yet processed to get links from them and change these links for offline browsing.
>
> If you stop now, you can continue any time later by choosing "Do not load existing files" in the Project Properties dialog or by selecting the Project and pressing Ctrl+F5. However this will force Offline Explorer to go through all 17,000 files again to parse them.
>
> Oleg.

Hello :-)

Thanks for the quick helpful reply with detailed explanations. What i actually did yesterday (when i had to leave the PC pool), was a click on the "Suspend"-button (because that sounded reasonable, like the ''Pause''-button of a tape deck). The icon of my project changed from green arrow to yellow double-bars. I thought "fine! tomorrow i''ll continue from that point onwards". I closed the software, shut down the PC and left the place. Today i am back to continue. Launched the software, and ... the icon of my project is a red stop. :-?

I guess the icon changed from ''pause'' to ''stop'' because i had closed the software last night.

Thankful for your quick reply I have followed your instructions: pressing "Ctrl+F5".

I''m learning...is the following then correct?
button-"Download missing files Ctrl+F5" === project properties-"Do not download existing files"
button-"Update Project Shift+F5" === project properties-"Download only Modified and new files"
button-"Restart Download Alt+F5" === project properties-"Download All files"

Question1: Does that download-button (3 alternatives) overrule my set project properties when i press it, but not overwrite (=change) them? [The existence of the 3 equalities is a bit confusing..; If i have clearly defined my project properties, why would i then want to choose between three alternative download-buttons. Less confusing would be a single green button called "Run the project".]

Question2: Why wouldnt have been "Shift+F5" the correct instructions?...since the 17.000 files were the bulk (estimated 98%) but not the complete download result. [During the long parsing yesterday, the tool caught, here and there, a few additional small files - thus filling the remaining 2%. So i thought that the "(Download only Modified and) NEW FILES" *are* the lacking 2%.]

Question3: Let''s assume that Offline Explorer is completely done with the project. If i want the tool to rerun the project in order to download the files which it didnt catch at the first run (e.g. server time outs) --- which i wouldnt necessarily call an "Update the Project" ---, which is the better choice: Ctrl+F5, or Shift+F5, and why? Or, do both lead in the end to the same result (given that there are no changes on the server in the meantime)?

Question3 arises from my doubts that how i can be 100% sure that OE downloaded all files 100% complete. No missing ones. And all fully parsed.

Thanks again for your help and attention,
best regards,
Peter
Oleg Chernavin 05/26/2008 12:44 pm
You are correct about the xxx+F5 buttons meaning.

1. Yes, these keypresses override Project settings temporarily - only for this download. They do not change the settings actually

2. Shift+F5 would do another job in your case - it would connect the server for each of the files and ask if the file has changed or not. Ctrl+F5 will go through all files on your disk to make the list of missing ones and it will get only them.

3. Ctrl+F5 is also enough here. Because if some pages have links that are allowed by the Project settings, but there is no corresponding file, Ctrl+F5 will make Offline Explorer to download such files.

Oleg.
Peter 05/26/2008 01:21 pm
> You are correct about the xxx+F5 buttons meaning.
>
> 1. Yes, these keypresses override Project settings temporarily - only for this download. They do not change the settings actually
>
> 2. Shift+F5 would do another job in your case - it would connect the server for each of the files and ask if the file has changed or not. Ctrl+F5 will go through all files on your disk to make the list of missing ones and it will get only them.
>
> 3. Ctrl+F5 is also enough here. Because if some pages have links that are allowed by the Project settings, but there is no corresponding file, Ctrl+F5 will make Offline Explorer to download such files.
>
> Oleg.

Thanks Mr. Chernavin. :))
Your explanation were *very* helpful. The project completed 2 mins. ago. So I have just re-pressed Ctrl+F5 (mouse-click on the download-button menu item) for another run. I''ll then compare the "before" and "after" results...with the help of a cataloguing tool.

I am too(!) curious to know how many files are missing, after this (combined: yesterday+today) 1st run.

I should time the 2nd run ;)

Thanks so much.. Will TTYL, have a nice day!
Peter 05/26/2008 02:18 pm
> I am too(!) curious to know how many files are missing, after this (combined: yesterday+today) 1st run.
>
> I should time the 2nd run ;)


Hello again :)

Interesting, the 2nd run (Ctrl+F5) was completed rather fast (compared to yesterday''s Parsing)..
A comparison of my download-folder ''before'' and ''after'' the 2nd run yields that *no new files* (hooray!!) were downloaded, which means that OE did a perfect job catching all files at the very first run (=today''s first Ctrl+F5). Congratulations!

There are ~300 "changed" files (=redownloaded files of exactly the same file sizes. having now a different file date, of course, which qualifies them as "changed" ;-), but more interestingly

before: there were ~290 primary files (*.primary-format)
after: there is only 1 primary file left.

This means that the 1st run was not as perfect as i just hooraid, am I correct? (the existence of *.PRIMARY-files in a download-folder is a sign of an imperfect/incomplete download project, more or less true?)

Thanks for all, best
Peter
Oleg Chernavin 05/27/2008 04:59 am
Yes, Offline Explorer saves files with .primary extension before parsing them and then when parsing completes, they gets removed.

Oleg.
Peter 05/27/2008 07:25 am
> Yes, Offline Explorer saves files with .primary extension before parsing them and then when parsing completes, they gets removed.
>
> Oleg.


Thanks for that confirmed info.
Another day has passed. Yesterday, after the "Ctrl+F5"-hooray (=no new files downloaded & all PRIMARY-files deleted), i thought ''hey let me update the project to see what happens''. So i pressed "Shift+F5".

Observation1: OE, indeed, checked all files for modification and downloaded only the changed files (~4MB). The "View>Messages>Panel|Log Window" indicated clearly and in detail what OE did. All fine.
Well, since i was running out of time again, i had to stop the Update Project [@point1].

Today, i wanted to continue with it OR rerun it (it = updating my project), because i had interrupted the Update Project yesterday [@point1]. I guessed that OE cant continue from where i had left [@point1], but would re-check *all* files again. No problem, today i have more time for all this. So i pressed "Shift+F5".

Observation2: This time, OE didnt do any file modification check...but downloaded *all* files from scratch, beginning with 0.0GB - as if my download-folder is empty. So, the Update Project-function did not work as i expected.

Could you confirm this behaviour and explain it? (Why would OE restart the project from scratch instead of updating it, although the project settings are set to "Download only Modified and new files"? - Different behaviour today vs. yesterday.)

Thanks for your support, Peter.
Oleg Chernavin 05/27/2008 08:14 am
I would need to see logs (Ctrl+Q) with the details on what exactly it asks the server for, etc. It is hard to guess on what could be wrong.

Oleg.
Peter 05/27/2008 08:51 am
> I would need to see logs (Ctrl+Q) with the details on what exactly it asks the server for, etc. It is hard to guess on what could be wrong.
>
> Oleg.

I checked the Logs. The Log-"Disconnect Status" says only ''Download complete. URL: http://www. etc.'' and the Log-"Details" don''t show any special clues either. [...]

Interestingly --i checked the file dates-- the JPEG''s were/are not re-downloaded. So...OE, indeed, seems to follow one of the ''Update Project''-rules called "Skip existing Media files".

But, apart from the media files, all other files (mostly *.ASP-format) are being re-downloaded.
Maybe my Observation2 has something to do with the cookies, since I''m downloading from a login/pass-protected website, where i first login (OE-internal browser) and then fire off the download project.

Anyway, i''ll re-press the green Download-button (project settings set to ""Shift+F5""-type), once the Parsing is done (in 1-2 hours). -- if then the UpdateProject works as supposed/expected, then i could be right with the ''cookies-assumption'', couldnt I?
Peter 05/27/2008 10:04 am
> Anyway, i''''ll re-press the green Download-button (project settings set to ""Shift+F5""-type), once the Parsing is done (in 1-2 hours). -- if then the UpdateProject works as supposed/expected, then i could be right with the ''''cookies-assumption'''', couldnt I?

Parsing done. Have just pressed "Shift+F5". Now the desired update result: log says "Read Transaction complete. Status: 304 Not Modified.". So, now, the Update Project function works. Hours ago, the Update Project (of the identical OE-project. only difference: yesterday vs. today) did not work as expected.

Anyway, i am happy today (in total, 3 days (incl today) of working on the same OE-project). And will also press "Ctrl+F5" afterwards, just to go sure.

If you have a possible explanation for the observed 2 different Update Project behaviours, please let me know, thanks for all.

Peter
Oleg Chernavin 05/27/2008 10:42 am
It may depend on the server. Offline Explorer stores the file modification date (provided by server) in descr.wd3 files in the download directory.

When you select to update, Offline Explorer asks the server to output the file if its modification date changed. If the server responds 304, the file has not changed. Otherwise, it tells 200 OK and let''s Offline Explorer to get the file.

This is how it works.

Oleg.
Peter 05/27/2008 11:50 am
> It may depend on the server. Offline Explorer stores the file modification date (provided by server) in descr.wd3 files in the download directory.
>
> When you select to update, Offline Explorer asks the server to output the file if its modification date changed. If the server responds 304, the file has not changed. Otherwise, it tells 200 OK and let''''s Offline Explorer to get the file.
>
> This is how it works.
>
> Oleg.


Thanks. [...]

I''m observing (from the OE-log "Disconnect status") that webpages of a specific type of content (contenttype1) are being re-downloaded (=updated) by OE, although their content is the same (same file size, same CRC32), whereas webpages of another type of content (contenttype2) are returned as 304 (same file size, same CRC32).

So in the end it''s the server''s fault i guess: the server permanently changes the file modification date of files of contenttype1 (although the files didnt change CRC32-wise), and leaves the files of contenttype2 in peace. This would explain OE''s Update Project behaviour (right now). Question: Do such strange servers / servers'' behaviour exit? [...]

All i am interested in was/is a *100% complete downloaded project, with no missing files*; in the past 3 days i was never really interested in an "Update Project" since it was the first time that i created and run the project, so the changes on the server would be minimal and totally negligible. The current run of "Shift+F5" (today''s 2nd time!) indicates that ~300MB are being redownloaded (=updated) but i know that only a few kilobytes can/might have changed in the meantime (since today''s 1st time). A folder comparison of ''before&after'' proves it.

In conclusion, i should have stick with the "Ctrl+F5"...and never pressed the "Shift+F5" (yesterday 1x (but interrupted), today 2x).

I learned quite a lot about the reasons of OE''s behaviour in these days. So next time i''m faster at concluding my OE-projects :)
Oleg Chernavin 05/27/2008 12:07 pm
Yes, different pages are reported differently by servers. Some of the pages are dynamic, like PHP, ASP, JSP, etc. They are generated on the fly by the server each time you request them. So, for the server they are always new or changed.

Oleg.
Peter 05/27/2008 04:45 pm
> Yes, different pages are reported differently by servers. Some of the pages are dynamic, like PHP, ASP, JSP, etc. They are generated on the fly by the server each time you request them. So, for the server they are always new or changed.
>
> Oleg.

Ahh, this explains all and everything! Thanks for the enlightenment. :)
[...]

After 3 days, i am finally down with my (test) download project. The number of files didnt change after the (yesterday''s and today''s) Shift+F5''s and the various (yesterday''s and today''s) Ctrl+F5''s. And even on day1, all was going well (complete download, only interrupted project during the final phase of Parsing, which resulted in "additional" *.primary-files). Summing up, if i hadnt interrupted, on day1, the project -- i would have been as done with the project as i am now! :D

So for this large download project (17.000 little files is quite a lot for Windows to handle), OE worked perfectly from the very beginning - and did not miss out any file!

Dear OE tool, congratulations! And dear chief coder (Mr. Chernavin), congrats as well!!
My dearest thanks to you and your marvellous product!
Please keep on going with the development of this tool, #1-world''s best offline browser.
With best regards and deep appreciation,
Peter

(If there is any ''feature request''-thread ;) in this forum, please link me to it, thnx. :-)
Peter 05/27/2008 04:47 pm
> After 3 days, i am finally down with my (test) download project.

should read:
> After 3 days, i am finally done with my (test) download project, yahoo!

:)=)
Oleg Chernavin 05/28/2008 05:01 am
This is great! Thank you for your kind words!

Oleg.