Problems with large multiple files

Author Message
gunnar 05/11/2007 07:00 pm
I am mainly asking this, in case something has been done, improved, in the case of this old problem (for me) since version 3.3.

That is, when downloading multiple simultaneos large (100/300MB) rtsp, rm files, the following happens:

- the first file is done, OEP starts moving, processing the tmp file to the primary file to the final file.

However, this takes all the processing power, heavy hard disk work, so OEP does not keep up doing he rtsp-protocol with the remaining files.

In my case, they time-out with
[5/12/2007 1:08:45 AM] RTSP1: Download complete. Status: 200 OK.
even when these other files are not fully downloaded.

The one which got fully downloaded, and caused the problem, gets
[5/12/2007 1:07:25 AM] RTSP1: Download complete.
(nothing about Status, 200 OK)

Note the time difference, some one minute and 20 seconds which is appr the time it takes to process the tmp, primary and final rm-file (with fastest available hard disks, UDMA-5, etc)

I have the time-out setting of OEP as some 900 seconds (retries to max 10).

hmm.. should I have a much shorter OEP timeout and much more retries???

I think it is a matter of the rtsp-server's time outs??

Note, earlier I have been able to avoid this same problem by just downloading one file at a time, but at "full speed" and using a 250s "delay between downloads" to give OEP time to finalize the file (it usually takes some 100-150 seconds for typically 300-400MB files).

However, now that server and OEP does not anymore agree to download a file at "turbo-speed", 4-8x the "streaming speed" of the (archived) rm-file. That is, now the files are downloaded at their streaming speed, some 13kBps instead of earlier some 40-60kBps.
That is why I (once again) is running into this time-out problem for multiple downloads while OEP is doing the tmp-primary-final processing, I do not want to download only one single file at the slow "streaming speed".

Oleg Chernavin 05/12/2007 10:37 am
I implemented a workaround against unnecessary processing such streaming files in recent versions. However I want to check this again. Could you please give me a couple of links to such RTSP files?

Thank you!

Best regards,
Oleg Chernavin
MP Staff
gunnar 05/12/2007 01:35 pm
Search-search-search, but I just replaced the bunch, project of rtsp-links I had problems with..

however, if I would try to reconstruct something like that it would be something like this , for an example of three simultaneous downloads

very long
almost as long
long enough to be done first but still takes enough of tmp-etc processing, he first two ones are cut short due to some time out. (download OK, but status 200OK for the two first ones)

rtsp://video.c-span.org:554/archive/arc_btv/btv050507_4.rm
rtsp://video.c-span.org/60days/ap050507.rm
rtsp://video.c-span.org/15days/wj050907.rm


That is, the third file is the shortest, 3 hours, some 180 MB and will be ready before the earlier two ones are downloaded (if I got it right).

When the third, shortest, but still large enough file is done OEP starts doing tmp-primary-final processing and the two first ones are (abruptly) ended by some time-out.

another good candidate for the third one, shortest long enough file might be (4 hours, more than 250MB)

rtsp://video.c-span.org/15days/e042407_fda2.rm

That is, I had to download the two first ones, exceptionally long ones, 300-400MB, as single downloads to finally get them correctly, fully downloaded.

Note, earlier I could download files like this "one-by-one" with 250 seconds inbetween, at 5-6x the streaming 100kbps speed. But since some two-three weeks I cannot get but 1x to 2x the streaming speed, the reason I try to download 2-3-4 at the same time with a 500-700kbps max speed.

If this continues I think I will be able to give you some real examples next week. The problem is that it takes some 3-4 hours to repeat the problem.
(the 3-4-5 hour, 200-400MB long files are usually from the weekends, the daily files are usually shorter)

Note, the load on the "video.c-span.org" server (or the way I connect to it) also varies according to the time of the US day/evening/night as well as especially the weekends.

Nobody promised it would be easy and simple..

Gunnar






> I implemented a workaround against unnecessary processing such streaming files in recent versions. However I want to check this again. Could you please give me a couple of links to such RTSP files?
>
> Thank you!
>
> Best regards,
> Oleg Chernavin
> MP Staff
Oleg Chernavin 05/14/2007 06:29 am
I tried this in the latest 4.7 version and it works fine. I looked at the lines in the log:

RTSP0 - 14.05.2007 14:26:03 - 9 seconds of 22664 of rtsp://video.c-span.org:554/archive/arc_btv/btv050507_4.rm.
RTSP0 - 14.05.2007 14:26:03 - Writing index of rtsp://video.c-span.org:554/archive/arc_btv/btv050507_4.rm.
RTSP0 - 14.05.2007 14:26:03 - Download complete. Status: 200 OK.

The file is not parsed - it is copied in the background now.

Oleg.
gunnar 05/15/2007 06:04 pm
Question, observations and more questions.

Q1. In the test you did, did you do a multiple simultaneous download of all three files??
(just doublechecking)

Latest observation:

My problem might have to do with WIndows (XP) needing to work on, or with, its swpa-file for virtual memory??
That is:
I usually run these multiple downloads on (old) computer, pentium4, 1.3GHz and unluckily only 338MByte of RIMM-RAM (expensive RAM so I never bought more), it is mainly "functioning" by using a lot of virtual swap-memory, not much left for the system, etc RAM-caching.

That is, I managed to watch one case in more detail, the one I tried to explaine, three large files and the first one which is dowloaded starts the tmp/primare/final file proccessing.

I watched the usage of the hard disks using the performance-monitor of XP, read and writes to the hard disks.
I also watched the internet connections using this MetLimiter which shows the connections of OEP and the speed of the.

This time the tmp-primary/etc processing to much more time than I had expected, insted of some 100 second, at least, maybe, some 3-4 minutes. The disk-LEDs where blinking, but not "always on", instead blinking fairly slowly.

On the network-activity I observed that the speed for the two remaning (larger) files went to zero and then the two connections suddenly "disappeared" (somebody disconnected them).

Conclusion: Maybe Windows decided to rearrange, etc its virtual RAM, swap file because the the first (large enough) file, etc did not get room in the actual RAM during the tmep/etc processing??

It seems possible that WIndows has problems handling the other processes while it is "working hard" on its swap file??

Note: I have never had problems with downloading (multiple, simultaneous) files smaller than 50-90MByte, the problems start above appr 100-150MByte.

That is, more questions:

Can you give some information of what OEP does with the tmp file??

Q2: Is OEP doing a "pure" copy to the primary file?? (in that case it would behave differently if the tmp file and the final destination is on different partitions or the same partition (move does only change the FAT table entry).

Q3: does OEP try to load the whole tmp, primary or final file inte memory (that would make WIndows getting into troubles if too little RAM and make windows start working on the swap file) or does OEP read-write the files in small blocks??

Q4: At what stage does OEP actually go through, parse the whole file, finding the packets and indexing them, etc and then do the final Index block at the end of the file

Q4: I assume OEP is doing that parsing, indexing at some point, because the tmp-files do not have the index block, the index table at the end??

Btw, I think this also has something to do with this observation about playing tmp-files.
- if the tmp files are the result of downloading a live stream the realplayer will not play the, but mplayer playes them.


One more observation:

Sometimes, especially when I download 2-3 simultaneous streaming rtsp streams and "finalize" one of the files I can clearly see that the connection (with Netlimiter, as well as the dataLED on the DSL modem) of two other continue.

This happens even if the stream I am finalizing is a "large file", larger than 100-300 MBytes and the OEP-processing of it takes a minute or two.

Sometimes the connection speed of the "two others" drop to zero for some "20-50 seconds" but pretty soon pick up to speed, even going "double speed" (to catch up).

However, when OEP is working hard with those files one can not really trust any measurments, monitoring of what WIndows is actually doing (only the data LED of the ethernet connection can be trusted)

Ouch, an OEP crash, more in the next posting




> I tried this in the latest 4.7 version and it works fine. I looked at the lines in the log:
>
> RTSP0 - 14.05.2007 14:26:03 - 9 seconds of 22664 of rtsp://vid
gunnar 05/15/2007 06:44 pm
Just some info on "recovering" OEP chrashes, time-outs of (live!!) streams.

That is, if the rtsp file is an archived file it is just a matter of restarting, rebooting and re-download them, the ordinary pain, that is not the case for live streams because they might be totally lost.

My first observation was that the tmp files were sometimes left in the tmp-directory when "something goes wrong", especially when OEP crashes, internet time'outs or whatever (it happens, if one does not restart it or reboot windows every now and then)

The problem was still that OEP often deleted the tmp file, for example when OEP made a new attempt (and a new tmp file) to continue dowloading the same live stream (I have not really figured out how and hwen it does it, but it does it)

I then started to use an "undelete" tool and made it watch the tmp directory of OEP.

This turned out to work great. If set up correctly it does not actually copy (which would take a lot of time) the files that are beeing deleted, it just marks the directory, FAT table of the file as "used" , "protecting" the file from overwriting, making it possible to recover it.

Before that I had noticed that if I opened the tmp-files with some other program (mplayer, or anything) I could "protect" them from beeing overwritten (I started to write my own "protect" program, but then understood that there must be some "undelete-tool" which produces "the same result).

One practical thing with his "undelete" tool is that I can tell it to purge deleted files older than 2-3 days.
(although disk space is cheap these days)

That is, it is fairly quick to check if the tmp-files in the "undelete" program matches, in size and date, the files OEP has actually processed to final files.
If they all match, there has been no problems.
If there is some tmp-files which cannot be found as a primary or final files...ahaaa!! that one had a OEP or time-out, or whatever disconnect problems!!

However, these tmp-files (from CSPAN live streams) cannot be played in realplayer, but mplayer playes them (except of they have some really bad packets in them). That's why I asked for a button for OEP to process selected tmp-files to final files.


OEP could do something similar, to avoid deleteting and overwriting tmp-files, even give some "Alarm" and possibility to recover them (and then reprocess, finalize the parsing, indexing, etc)

-----
ANother "trick" when I download (C-SPAN.org) live streams:

OEP (this old version 3.3) tends to crash or "not work", for example, when downloading 2-3 live streams and then finalizing one, and then starting a new one (goes to the same RTSPn connection).

My nice trick which has almost toally solved these problems is to

- I am downloading three live streams..
- starting a fourth (overlapping)download on the stream I am about finalize, waiting until I see it is actually downloading
- finalizing the one I wanted to finalze (to avoid the file getting too large)
- waiting until the the finalizing file is fully finlazed

here comes the trick!!

- then I decrement the number of connections back to three or less (in this case).
- this makes OEP disregard the RTSP connection that I just finalized, which often do not work for another, new, download.
- OEP often complains about something crashing somewhere, but "who cares". (it seems that when OEP complaines when I decrement the connections, OEP is trying to "clean up", but that is exactly when a new download would not work)
- Then I increment the number of connections back to one or two more than what is needed.

- It seems these new connections are "pure as at birth", that is, I can start new downloads with them.. (and the old RTSPn connection which probably would not have worked has somewhat happily disappeared into the magic space of leaking windows memory... sometimes with the happy sound and sight of OEP error-dialogs..)


Funny stuff..

Gunnar
gunnar 05/15/2007 07:05 pm
Status: 200 OK.???

This is an 3.3 question, so I do not really expect an answer.

However, one of the best signs of a truncated downloading of archived RSTP files is the difference between "Dowload OK" and "download OK, Status 200" (nothing to do with live streams, and all of my experiences realte to the C-SPAN.org servers, which are different for live and archived files).

That is, when an archived rm-file is downloaded correctly, I get only "download complete".

When the (archived) file was not downloaded correctly I get that "Status 200"

Funny stuff.. but his "status 200" is, at the present, my fastest way to find out when some (archived) files have been downloaded incorrectly (truncated).


That is, I have decided that in OEP-speak "Status, 200 OK" means that the file was not correctly downloaded... (but this is true only for archived rtsp-files, not for live streams and only an observation based on C-SPAN.org files)

Funny stuff..
gunnar 05/15/2007 07:30 pm
Minor correction, 3.3 says "[5/16/2007 1:49:37 AM] RTSP1: Download complete." when everything went correctly.

When it talks about "Status 200" something has gone wrong, the file is truncated.

I cannot doublecheck it now, but I guess that ithis f why OEP (3.3) always says "Status: 200 OK" for live (not archived) streams, because live streams are never fully and finally downloaded nor "completed"??
(OEP has no other possibility than "forcefully ending the stream" thatn to cut the live streams "short" and then to start to index, etc the truncated file, of a live stream which continues for ever and forever)..

One more weird observation..

OEP seems to check the size, or something of archived streams (files), the size, seconds or something of the file already on the hard disk, and the file OEP is about to start to dowload.

That is, when the same (archived) file is already dowloaded, but maybe not fully, correctly downloaded, OEP starts reading the already downloaded file, maybe trying to find out where to continue downloading, or something like that. (the options of "only download new, unmodified files", etc,etc..).

I am forced to admit, I have never really figured out those OEP-options, "new, modified, existing, etc" plus that numbering system of multiple copies of files, while I have been downloading all these Terabytes of c-span.org files..

When I use my own brain they do not really make any sense compared to how the settings behave...
(but they seem to work for html-files?? although I do not do much of that).

PS OEP seems to behave even more strangely when I do not exit-restart OEP after a failed, truncated download of an arhcived rm-file. OEP starts looking into some primare file of the same failed download and complaines because it cannot find the primary file, or find anything useful on how to complete it..

I have come to the conclusion that OEP has found some information in the WD3 files, or somewhere else, but it becomes just as confused as I often is..

That is, is there

- some good explanation on this tmp/primary/final file??
- what is there in the WD3 files

PS Expcept for these funny, most human behaviours, OEP is really great..


> Status: 200 OK.???
> ..
Oleg Chernavin 05/21/2007 03:47 pm
I am sorry for not asnwering sooner. This delay happens because OE started to parse these hige files and it took time to copy to another folder. Later I added code to copy in background (another thread) and also it now checks the beginning of the file and if it is media, there is nothing to parse and it will be simply copied in background.

The feature to convert temp .rm files to playable is impossible, because OE keeps index in the memory and when the file is being finished, the index is being added to the file from memory. So, after OE crashes, there are no data to add to the file to finish it.

Media Player files can be really played as they are - there is no index to add.

Yes, Status 200 means that a timeout happened or the server closed the connection and we haven't got all data packets yet.

In fact, Finalize emulates the situation when a server closes connection. It sends the close socket message to itself and saves the file.

When you start the download in the "Download only modified and new files", OE gets number of seconds from the downloaded file and tries to get the rest. descr.wd3 files contain MIME type, date/time modification information (returned by server) and original file size in bytes (for check size feature).

Oleg.
gunnar 05/29/2007 01:28 am
I have been able to verify "the problem" a couple of times, by watching memory resorces, hard disk activities and monitoring the speed of the internet-connections.

It seems my XP installation, windows tries to keep appr 70-100MB of free, available (physical) RAM.
This seems to mean than rm-files smaller than that 100MB are processed "normally".
But when the file is 150-200MB, it seems XP starts messing around, swapping stuff to the swap file.
This can go on for a much longer time than when the file is small enough to fit in the available physical RAM.
For example, 100-200 seconds instead of just some 10-30 seconds.

It also seems that this (swap file stuff) is when the connections of the other files beeing downloaded are not "running".
I am guessing that this triggers a time-out in the rtsp-server.
A couple of times I have been able to use the OEP "Restart" (file download, right click on channel) to "get the download going again" for this other files which are not yet done.


However, when the above problems happen OEP often starts "having problems", error messages at exit, new downloads do not start, or the tmp-final processing does not get done, etc, etc.

---
ref you reply

Q1 from what version, etc did you make those changes, another thread, etc..

Q2: To doublecheck, "no parsing of mediafiles", are you saying that archived rm-files do not need any parsing?? But they still need the index-table to be added??
That is, it seems that is one difference between the tmp files, the primary file and the final file??
(I have not yet taken the time to figure out the exact differences)

Q3. That is, I am just trying to get a "feeling" for the temp-primary prosessing and the primary-final processing. (by monitoring disk read-write activities, CPU load and free RAM, it seems to point to the primary-final processing, but then the temp-primary processing has, maybe, already used up most resources??)

---

Anyway, I managed to find some used RIMM-RAM for a descent price so in a week or two I will see if another 256MB will get rid of this problem.


PS "long time ago" I had (maybe) similar problems. I solved them by making sure I was reading and writing the files in small chunks (4-16kB), flushing the write-file inbetween the chunks and even closing and reopening both the read and write files.
I also put a windows message loop inbetween every read-write of the small chunks to make sure other processes got their messages served. (that is, I am not really sure what helped, but finally the program managed to process the large files with only a small amount of used RAM, while the other processes, threads were still able to run, although slowly)

Gunnar


> I am sorry for not asnwering sooner. This delay happens because OE started to parse these hige files and it took time to copy to another folder. Later I added code to copy in background (another thread) and also it now checks the beginning of the file and if it is media, there is nothing to parse and it will be simply copied in background.
>
> The feature to convert temp .rm files to playable is impossible, because OE keeps index in the memory and when the file is being finished, the index is being added to the file from memory. So, after OE crashes, there are no data to add to the file to finish it.
>
> Media Player files can be really played as they are - there is no index to add.
>
> Yes, Status 200 means that a timeout happened or the server closed the connection and we haven't got all data packets yet.
>
> In fact, Finalize emulates the situation when a server closes connection. It sends the close socket message to itself and saves the file.
>
> When you start the download in the "Download only modified and new files", OE gets number of seconds from the downloaded file and tries to get the rest. descr.wd3 files contain MIME type, date/time modification information (returned by server) and original
gunnar 05/29/2007 01:44 am
wd3 files.

Note, I am just downloading and downloading to the same directory, and every now and then I move the oldest downloaded files to another hard disk.

Should I delete all the wd3 files every now and then??

That is, I only use the "do not download existing media files", and I do not download millions of small html,etc files, but maybe "thousands" of 50-300MB rm-files.

That is, I have never been able to get the ""Download only modified and new files" to work, or to do something I would understand, for these rm-files.


However, sometimes, or maybe most times, OEP seems to start reading a file which has not been fully, correctly downloaded.
But I have not checked this behaviour systematically, because I have my own program which keeps track of already downloaded files and makes a list of the new files I want to download.
(My problem is that this my own program cannot check if the downloaded files are fully or partially downloaded)

PS Btw, that "status 200" seems to reliably flag (archived rm-files) which get only partially downloaded due to this "time-out problem".
Another way I can detect them is to check the dates and times, if they are finlalized within some seconds or a minute of when a large file was downloaded (causing time-outs for these other files)



>
> When you start the download in the "Download only modified and new files", OE gets number of seconds from the downloaded file and tries to get the rest. descr.wd3 files contain MIME type, date/time modification information (returned by server) and original file size in bytes (for check size feature).
>
> Oleg.
gunnar 06/02/2007 11:55 pm
I got more RIMM-RAM isntalled, 250MB free available RAM and the "large file" is only 180MB but it still did a timeout on the other files I was downloading.

That is, the tmp-priamry-final processing now goes really nicely, windows do no have to fool around with its swap file, but the other files that OEP was downloading "did not get served" and the C-SPAN server gave a time out (downloaded..200 OK) and it truncates these other (also archived, not live) files.

Strange stuff, because when OEP is finalizing smaller files than appr 150MB the "other files" seem to be "getting served", no time-uts for them.
That is, no timeout from this particular server??

However, it seems "this particular rtsp-server" has a really short time-out, maybe only some 30-50 seconds. With OEP not asking for new packets during that time it is enough to disconnect and OEP then truncates, ends all the other files OEP was downloading??

Difficult to figure out because I have to wait for some three hours and then catch those crucial seconds when OEP is doing the tmp-etc processing of that one "huge file" (which truncates, time-outs all the others that are beeing dowloaded)

A true pain... GUnnar
Oleg Chernavin 06/05/2007 03:13 pm
Sorry for this pain. I was improving this in several 4.x versions. I am reading the changes history. For example, AVI files parsing was fixed in 4.2 version. Some other streams were fixed in 4.7 version and so on.

OE really needs to add index table, but it is usually quite small relatively to the size of the file. Also, it changes several bytes at the beginning of RM file, but it is also quick.

I would also suggest to look at the Options dialog - File Locations - make sure that Temp and Download Directories are on the same disk to minimize time to move the file.

If you don't plan to update files, descr.wd3 is not that necessary. They also contain MIME type of files, but if you don't use the files inside OE (export, browse, etc.), they can be also discarded.

The timeouts on various servers may be different - depending on their settings.

Anyway, can you please try to use the latest 4.7 SR1 version. Here is the most recent oe.exe file:

http://www.metaproducts.com/download/betas/OEP2613.ZIP

I just added a feature to compact used RAM every 8 minutes. Please let me know how it works.

Thank you!

Oleg.