Any chance to have KML/KMZ parsed?

Author Message
Alex 08/11/2010 08:28 am
Hi there.

Is there any way to have .KML/.KMZ files being parsed for containing URLs?

KML is a XML with xml-like internal structure.
KMZ is simply a ZIPped .KML

Say, if I start a project with the URL http://mw2.google.com/mw-earth-vectordb/gallery_layers/ngm/zipusa/2009_01_09/ru/root.kmz

What should I do next?

1. Get the file from the queue and from Internet than
2. Unzip it, if it is KMZ. If it is a KML - skip to next.
3. Parse the file for URLs (threat as a standart XML file)
4. Add any found URL to the queue
5. Repeat 1...5 until the queue is exhausted.

How to do such of thing, please?? :)

Thanks,
Alex
Oleg Chernavin 08/11/2010 09:22 am
I looked at the KMZ file. It only contains 2 links to images (.png) and all other referred KML files are contained in the ZIP. What is the purpose of parsing the contained files?

Is there an example of KMZ that really links to other non-contained KML files?

Best regards,
Oleg Chernavin
MP Staff
Alex 08/12/2010 04:49 am
It is just that single file. :)
Others have TONS of internal links (mostly to another KML\KMZs stored at different folders within the same server). Also there might be more than one KML file inside KMZ package. See this one:
http://mw2.google.com/mw-ocean/ocean/kml/ark/en/0/root.kmz

PS: and yes, I need all those images (as you mentioned) as well. In fact, I need ALL resources within all links inside the KML\KMZ (images and all those kml\kmz referring to each other), to keep the whole files structure locally on my server's HDD (for caching purposes).
Oleg Chernavin 08/12/2010 03:38 pm
I see. Yes, this one contains links to external .kmz files, like http://mw2.google.com/mw-ocean/ocean/kml/ark/en/0/03232.kmz

I will try to work on this, but not sure if it will take a lot of time or not. It is not easy because I have to unpack the file and pack it back after URLs are changed to offline ones.

Oleg.
Alex 08/13/2010 12:56 am
Nono, I no need them to be changed.
I need to duplicate the file structure on the selected server (say, http://mw2.google.com/).
Most of URLs in the KML file are not absolute (like "pics/icon.png", but not "http://mw2.google.com/pics/icon.png"), so there is no any need to modify the KML itself.
If there is an absolute path to the different server - just add it to the queue as it is, and OE will download that to the different folder (say, /Download/mw3.google.com/* instead of /Download/mw2.google.com/*) without any modifications of the KMLs. I need them in their original state - just to mirror'em as they are at their original location. And my Squid will deal with all absolute URLs when they are in the /cache folder, dont worry about that. :)

So, the idea is to mirror the whole KML/KMZ structure (including all crosslinked files) from the server to my HDD, without modifications of the files themselves. Just get the KML, save it, [unpack+]open the file, parse for the next links and drop it. Nothing needs to be to modified in the files.


PS: or may you have other idea to do the task? May any scripts be attached to OE to do such of job? Im familiar with Perl and part of the job may be scripted out (like a part of unzipping the files), etc. Just yet no idea how to attach the script to OE, except than a proxy-script way...:(
Oleg Chernavin 08/26/2010 12:33 pm
Well, so far there is no such scripting way yet. I will think on adding that to 6.0 version. Quite loaded with lots of developments so far.

Oleg.
Yury 08/31/2012 08:17 am
Приветствую, Олег!
Судя по всему парсинг kml не приоритетная задача? Жаль, это даст возможность загрузки panoramio. Может быть найдете время?

http://www.panoramio.com/panoramio.kml?LANG=ru_RU.utf8&BBOX=8.964843750,42.293564192,9.140625000,42.423456518
Oleg Chernavin 08/31/2012 08:19 am
I added KML files parsing (non-compressed files so far). Here is the updated oe.exe file (Professional Edition):

http://www.metaproducts.com/download/betas/OEP3836.zip

Oleg.