Downloading PHP forums; OE not distinguishing between different files with same name

LegoDruid
08/07/2005 03:56 pm
I spent a while figuring out how to download PHP forums. The Q&A in the MPSC forums was very helpful. I can now constrain OE to follow just the sub-forum or even thread that I want. This was handy for providing an example for the problem that I`m having now.

OE stores downloaded files in a single directory per server and directory name. For some forums, images apparent in the thread may be linked from different servers. In fact, this is pretty much the norm for the forum that I`m trying to download.

It`s an art board. Someone posts a photograph (ie: BOB.JPG) and other people submit their drawings from that reference. Many times, they use the same file name (ie: BOB.JPG again). These images are linked from servers other than the forum host, but the filenames are the same.

I cannot seem to force OE to distinguish between different files with the same name. For example, if two people submitted a linked reference (on different servers) to a file of the same name, the OE download of that site will have clobbered one of the files. Both references will resolve the first file, or both references will resolve the second file.

In case I`m not being clear enough, start with this URL:

http://www.sketchbooksessions.com/thedrawingboard/viewtopic.php?t=17348

The file that I`m using to test with is "Laetitia.jpg". Two different files with that name were linked from two different servers by two different posters. And yet OE will show only one version of the "Laetitia.jpg" file for both references. Essentially, the OE downloaded version of the forum is wrong.

Page 3. Posted: Mon May 17, 2004 7:19 am
Page 7. Posted: Sun Oct 03, 2004 10:18 am

I have tried all of the File Modification Checks under the Project tab.
I have tried activating File Copies under the Advanced tab.

In every test, OE still clobbers files. I am guessing that the File Copies option is for renaming files with the same name, between different download sessions. How can I rename files with the same name, during the SAME download session?
Oleg Chernavin
08/08/2005 07:49 am
One way to try is to have File Copies enabled in the Project Properties and run Offline Explorer this way:

oe.exe /NoURLs

However this way Offline Explorer may download the same links many times again and again, so you have to watch the Queue all the time.

Best regards,
Oleg Chernavin
MP Staff
LegoDruid
08/08/2005 07:23 pm
> One way to try is to have File Copies enabled in the Project Properties and run Offline Explorer this way: oe.exe /NoURLs

An interesting mode. It did something interesting to the HTML fragments that OE downloaded, but had no impact on anything else. The download folder is filling with many copies of the same HTML files:

download_20050808185652.php@id=6579
download_20050808185732.php@id=6579
download_20050808185826.php@id=6579
download_20050808190140.php@id=6579
download_20050808190332.php@id=6579
download_20050808190536.php@id=6579
download_20050808190542.php@id=6579
download_20050808190826.php@id=6579

But the latest-downloaded images are still over-writing earlier versions.

And the performance is unworkably poor. It has taken 20 minutes to run a project that without that parameter, took less than two minutes (and it`s not yet finished).

Is there anything else that I can do? Why doesn`t the File Copies feature resolve this problem? I had imagined, on first reading the Help file, that this was precisely what the feature was for.
Oleg Chernavin
08/09/2005 07:37 am
This is really strange that images get overwritten. It should not happen. I just checked it on another project - the images are really not overwritten. Maybe all images are simply in different folders?

Oleg.
LegoDruid
08/09/2005 06:37 pm
> This is really strange that images get overwritten. It should not happen. I just checked it on another project - the images are really not overwritten. Maybe all images are simply in different folders?

[1.] Function

Adding the /NoURLs parameter to OE.EXE seemd to activate the "Make Copies" feature. Although badly; it was far too slow to use. But only the HTML files had a datestamp added to their file names. The image file names were unaffected.

[2.] Effect

The problem is that OE`s offline version of the site has simplified the linkages for "Laetitia.jpg" to just "/thedrawingboard/Laetitia.jpg" in the download folder. The various "Laetitia.jpg" files actually come from different servers, but this fact is apparently disguised by the Download.PHP convention.

Even if there were "Laetitia.jpg" files in other folders, the HTML copy that OE makes of the PHP forums directs all "Laetitia.jpg" file references to "/thedrawingboard/Laetitia.jpg". This means that every reference - via Download.PHP - to "Laetitia.jpg" looks the same. Since I`m trying to use OE to monitor an art forum, it`s rather a problem if everyone`s sketch submissions look the same. ;+)

[3.] What Next?

Are you interested enough in the problem to try this yourself? The base URL is:

http://www.sketchbooksessions.com/thedrawingboard/viewtopic.php?t=17348

No Level Limit.
I`m downloading TEXT, IMAGES and USERDEFINED.
TEXT is set to use URL Filters.
IMAGES are Load from Any Site.
USERDEFINED will load from the Starting Server.
Protocols are HTTP, HTTPS and FTP.

URL Filters
Load files only within the starting server
Load up to [1] Links on other servers
Load files only within the starting directory and below
Filenames [CUSTOM] Download.php and Viewtopic.php?17348

On-line translation
Keep old files [99] copies
Format [Filename_7.htm]
Use date / time [YYYYMMDDHHNNSS]

I earlier referenced the posts within that particular thread where you can see OE fumbling the "Laetitia.jpg" links.
Oleg Chernavin
08/10/2005 07:17 am
I looked carefully and I can say that you don`t need file copies for that forum. Since the images are from different servers, OE will save them to different locations on your disk.

Oleg.
Oleg Chernavin
08/10/2005 07:40 am
I am sorry. I have overlooked one thing. To fix the download you don`t really need File Copies. Instead please download the updated version here:

http://www.metaproducts.com/download/betas/oee2093.zip

Also, please change the URLs field of your Project by adding the following line after the URL:

Additional=SkipDisposition

Then click the OK button and redownload the forum.

Oleg.
LegoDruid
08/10/2005 11:34 am
> http://www.metaproducts.com/download/betas/oee2093.zip
> Additional=SkipDisposition

First, thanks for sticking with me on this. I appreciate you spending time on this problem.

I downloaded the update and added the new parameter. Following, I turned off the File Copies mode and re-ran the project. The "Laetitia.jpg" file is displaying properly for the two references that I listed above!

Very interesting: there are no longer any images in the base /thedrawingboard/ folder.

I used "Save Target As" on both references, and they both defaulted to DOWNLOAD.JPG. Were the images downloaded to my HD by OE during the project event? Or are they downloaded from the servers after the OE download, when I use "Save Target As"?
Oleg Chernavin
08/10/2005 01:43 pm
There is a feature of some servers - the URL looks like .../download.php?id=..... , but it actually contains an image, which name is download.jpg or laetitia.jpg or some other. So it is possible that URLs are different, but the actual image name is the same for several URLs. This caused Offline Explorer to overwrite files.

The command I just added to Offline Explorer stops it from saving images under these duplicated filenames. They will be saved as download.php@id=.... This looks a bit weird, but at least it will make unque filename for each of the images.

Oleg.
LegoDruid
08/10/2005 08:47 pm
> There is a feature of some servers - the URL looks like .../download.php?id=..... , but it actually contains an image, which name is download.jpg or laetitia.jpg or some other. So it is possible that URLs are different, but the actual image name is the same for several URLs. This caused Offline Explorer to overwrite files.

The offline version of the PHP forums that OE is making is now accurate. The updated *.EXE and the switch resolved the image-clobbering problem. This is excellent. Thanks!

But: when I try to save the images that are framed with /download.php?id, the filenames default to Download.jpg. All of them. Files that I know are named Laetitia.jpg are coming up as Download.jpg.

For viewing the forums, this is not a problem. For collecting (manually, with <right-click> Save-As) the images that are posted to the forums, this is not so good. Every new image that I try to save to an artist`s folder will be called Download.jpg. The next image that I try to save will also be called Download.jpg. Is it the same image, or a new one?

* Before, OE reported the proper filename for images framed with /download.php?id but it clobbered older image files.

* Now OE can distinguish between different images with the same (original) name, but it renames everything to Download.jpg.

Can OE do both things? Distinguish between files and still keep the original file names? Am I asking for too much? ;+)

Oleg Chernavin
08/13/2005 11:41 am
I think, I have fixed this. Now you can use File Copies and filenames will be named as you wish. Here is the update:

http://www.metaproducts.com/download/betas/oee2096.ZIP

Oleg.
LegoDruid
08/13/2005 09:21 pm
> I think, I have fixed this. Now you can use File Copies and filenames will be named as you wish.

I`ll give that a try; thanks.
Oleg Chernavin
08/14/2005 03:24 am
Did it help?

Oleg.
LegoDruid
08/14/2005 09:55 pm
> Did it help?

Sorry for not getting back to you sooner. I didn`t get a chance to test this until tonight.

Unless I`m doing something wrong, the recent update did not resolve the problem.

I installed the 2096 binary and re-ran the Laetita test. The configuration still includes "Additional=SkipDisposition". All images framed with the Download.php widget still default to Download.jpg when I choose either "Save Target As" or "Save Picture As".
Oleg Chernavin
08/15/2005 02:11 am
Please remove the "Additional=SkipDisposition" line.

Oleg.
LegoDruid
08/15/2005 12:01 pm
> Please remove the "Additional=SkipDisposition" line.

Thanks! I`ll try this out tonight.
LegoDruid
08/15/2005 12:33 pm
> Please remove the "Additional=SkipDisposition" line.

Without the "Additional=SkipDisposition" line, OE`s behaviour has reverted back to its initial state. It is clobbering earlier files (from different servers) with later downloads of the same name. The "Save As" defaults for the offline forum copy are now correct, but the wrong images are displaying.

I checked the OE binary, and it`s 2096. I ran the Laetita test twice, just to be sure. Am I doing something wrong? I`ve been using the same settings for this test (with the exception of the "Additional=SkipDisposition" line) all along.
Oleg Chernavin
08/15/2005 12:51 pm
If you have File Copies enabled, then OE will create copies of these duplicate files. Links will point to the latest copy of an image only, but all images will be preserved. I am sorry, this is the best I can do now. You will have to explore all images directly from the disk.

Oleg.
LegoDruid
08/15/2005 03:39 pm
> If you have File Copies enabled, then OE will create copies of these duplicate files. Links will point to the latest copy of an image only, but all images will be preserved. I am sorry, this is the best I can do now. You will have to explore all images directly from the disk.

Thanks for spending time on this. I appreciate it. However, the proposed workaround just isn`t practical for my intended application. Perhaps this functionality will appear in a future version of OE.
Oleg Chernavin
08/17/2005 03:12 am
Yes, maybe so.

Oleg.