|Andre Hervieux||10/04/2005 02:56 pm|
|The project URL is: http://www.tsnhorse.com/cgi-bin/instant.cgi?type=inc
I am trying to download only the files on that page for a specific date, example of a URL:
Here are the settings:
Level limit: 1
Download all files
File Filters properties:
Only the Other box is checked:
and Location is: Load using URL filters settings, and there are no file sizes rectrictions.
Protocol, server, directory are set to all.
Now for the filenames include list: &date=2005-10-01
There is no exclusions
Everything looks ok when you look at the queue files being downloaded, in this example 38 files, corresponding to the date string.
But when you try to export the data, NOTHING!
There is nothing is the downloaded directory to export, nothing was created!
In an older version, when I exported the data, all the links where in html format, and I could then convert the files to text.
Why is there no data to export ?!?
|Oleg Chernavin||10/05/2005 06:45 am|
|Please check the File Filters | Text category as well. If it is unchecked, Offline Explorer will not save downloaded HTML files. This is useful when you need to load only images or video files and not keep Web pages.
|Andre Hervieux||10/05/2005 11:52 am|
|Checking the box File Filters | Text category solve part of the problem, that is now the files are exported and can be viewed in Internet explorer.
But I cannot parse the text using a HTML to text, because when you look at the source of the page downloaded with Offline Explorer, the format is one big paragraph of html code.
When you open the same page with Microsoft IE, and you save the page as htm/html only, when you look at the source, it`s preserve at the right format of HTML code or body. So that when you use an HTML to txt parser, it works.
Exemple of a URL:
What I need to do is save that page with all the HTML info or tags at the right place (will all the text info) if you like, so that after my HTML to text program can parse it.
Any solution ?!?
|Andre Hervieux||10/05/2005 04:04 pm|
|I`ve done a little bit of testing:
Only checking the htm/html in the File Filters | Text category improved results.
I checked the source of the downloaded file, and surprise! Offline Explorer puts those two lines of code:
<!-- saved from url=(0089)http://www.tsnhorse.com/cgi-bin/instant.cgi type=inc&country=usa&track=bm&date=2005-10-01 -->
These lines in the downloaded source file(s) is responsible for the html to txt parsing problem bug.
Removing those two lines, and presto! No more problem when parsing the file(s).
Is there a way for Offline Explorer not to add those two lines ?
|Oleg Chernavin||10/06/2005 06:59 am|
|This is easy to fix. When you export the site, uncheck the Add Original URL box in the Export dialog.
|Andre Hervieux||10/06/2005 08:16 am|
|Yes, that fixed the parsing bug.
Your software is simply the best available!
Thanks for your help!
|Oleg Chernavin||10/06/2005 08:36 am|