Exported URLs

Author Message
Neil 08/05/2008 03:47 pm
When I export a project and select "add original url to file" it doesn''t add the original url, but a lowercase version. For most sites this does not matter, however, a few have case sensitive urls and therefore the lower-case url does not work, is there any chance this will be fixed in some future update?

PS - I a still using OE 4.9 so if this has been fixed in version 5 I apologise.
Oleg Chernavin 08/07/2008 10:22 am
Sorry, this is not yet done.

Best regards,
Oleg Chernavin
MP Staff
Neil 08/08/2008 03:41 pm
> Sorry, this is not yet done.
>
> Best regards,
> Oleg Chernavin
> MP Staff

Okay, is this planned for future versions / upgrades because urls that don''t work are not much use.
Oleg Chernavin 08/10/2008 08:57 am
Well, although it may look like a simple improvement, it will require quite serious internal changes. This is why I do not know yet when exactly I will be able to do it.

Oleg.
Neil 09/02/2008 03:24 pm
> Well, although it may look like a simple improvement, it will require quite serious internal changes. This is why I do not know yet when exactly I will be able to do it.
>
> Oleg.

Given the correct case-sensitive urls appear to already be in the Descr.WD3 file, yes it does seem a relatively simple improvement to add those to exported files rather than a lowercase version.

And while I do on the whole find offline explorer pro to be excellent ( sorry I''ve posted in the wrong forum ) I find it slightly bizarre that version 5 introduces a host of dare I say, relatively unimportant UI improvements, yet leaves the key functionality that I bought the Pro version for, exporting pages, useless for any site that uses case-sensitive urls.

Whilst I appreciate you are not sure when you will able to solve this, I am really looking for a rough guide to whether it will be solved in the foreseeable future or not, as at least I can then make the decision to either hang for a while if so or find another software solution if the problem is going to remain for sometime.

Regards


Neil
Oleg Chernavin 09/03/2008 08:12 am
Neil, you are perfectly right. This info is available in descr.wd3 files. I was able to do this trick using them. Please test the updated version (oe.exe file):

http://www.metaproducts.com/download/betas/OEP2824.ZIP

Oleg.
Neil 09/03/2008 10:51 am
I''ve tested it and it works perfectly, as usual your support is excellent.

Thanks.

Neil.

Oleg Chernavin 09/03/2008 01:26 pm
You are welcome!

Oleg.
Neil 09/04/2008 04:41 pm
Oleg, sorry to be a nuisance, but it seems I was incorrect when I said it worked perfectly and also when I said the information was stored in the descr.wd3 files. It seems the information is only there when the downloaded files are named after the URL, but when the downloaded files are named "default.htm" ( ones where the url ends with a slash ?), then all that is in the descr.wd3 files is "default.htm".

So when exported, any pages downloaded and saved as default.htm still produce the lower-case URL, but now with "default.htm" appended.

For example, the url appended to the exported file for http://stuff.tv/Review/Sony-Ericsson-Xperia-X1/ becomes http://stuff.tv/review/sony-ericsson-xperia-x1/default.htm.

( In this case the fact it is lowercase doesn''t really matter as this site doesn''t use case-sensitive urls, it was simply the first example I found. It does however have default.htm appended to it, which isn''t so good )

So I guess there is no easy solution, although I assume you must have the correct case urls somewhere as OE must use them to successfully crawl case-sensitive sites, I guess I will have to try some other software.

Regards

Neil

Oleg Chernavin 09/05/2008 04:48 am
Yes, you are right. I am really sorry. I do not have an immediate solution to this.

Oleg.
Neil 09/05/2008 05:54 pm
Luckily I have found a rather long winded solution to the problem when I process the exported files, so for anyone else who exports files for processing and sometimes requires case-sensitive urls, it goes something like this:

1 - You need the IID value for any project you process exported files for, these can be found and extracted through regular expressions from the WebDown.dat file which is located in the application data folder, in windows it is something like - C:\Documents and settings\**your user name***\applications data\offline explorer\

2 - Once you have the IID value for the project you are process exported pages for then you can simply match that up to the relevant .map file, also in the application data folder, so if the IID of the site if 7020 then you want the 7020.map file.

3 - The map file contains a list of all the urls in that projects map, ( but these are the correct case-sensitive urls !), so all that needs to be done is match is a non-casesensitve match between the lowercase exported url and the url in the map file, and then extract it.


Oleg Chernavin 09/06/2008 05:31 am
It is good that at least this way works. I will continue thinking on how to improve this and make totally automatic.

Oleg.