pdf download problem

MJenny
09/28/2005 06:00 am
Hello,

I have a problem with a site`s pdf files. The site changed from doc-Format to pdf-Format in midcourse. I was able to retrieve the doc files with url macros without a problem, but not a single pdf has been downloaded. The file type pdf was added but it doesn`t work. The only way I can get the pdfs downloaded is by explicitly giving the complete path and file name. However, I can` do that a thousand times.

See for an example:
http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_02331/fname_030651.pdf

I want all pdfs lingering in the directories below the J-Level. Until J_00055 they used winword docs with an identical name (daten_000000.doc), since then the pdf name changes each time because of the counter included. The doc files were downloaded with this macro SINGLEURL=http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0000{:0..9}/daten_000000.doc

This one unfortunately didn`t work out for the pdfs:
SINGLEURL=http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0000{:56..99}/

Hope you can help.

Thanks in advance

MJenny
Oleg Chernavin
09/28/2005 06:10 am
I tried several possible PDF URLs, but they were not valid, so I don`t know if there is a system to make URL Macros.

You can try to load all possible URLs:

http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0{:0000..9999}/fname_0{:00000..99999}.pdf

But this will take too much time. Are there any Web pages that contain links to PDF files?

Best regards,
Oleg Chernavin
MP Staff
MJenny
09/28/2005 06:47 am
Hello,

thanks for the very quick response!

Here is an example of an page with 2 pdf-Links included: http://www.parlament.gv.at/portal/page?_pageid=908,964749&_dad=portal&_schema=PORTAL

MJenny
MJenny
09/28/2005 06:57 am
Your proposal made me think about the following addition for the next version of Offline Explorer Pro

What about adding a success counter to url macros?
E.g., you know there is a file with a file name in the second macro range 1000-2000, but you don`t know the exact number. So the macro starts running through the addresses until it hits upon a valid address. (make the number of valid addresses sought a user variable). Now stop the counter for the second macro, add 1 to the first macro and run through the range for the second macro until you score again, and so on. A hierarchy of macro counters would be of help here.

MJenny
Oleg Chernavin
09/29/2005 07:29 am
I created a Project with the URL:

http://www.parlament.gv.at/portal/page?_pageid=908,964749&_dad=portal&_schema=PORTAL

I added PDF extension to the File Filters | User Defined section and changed its Location to Load from any site. Level=1 - I loaded all PDFs well.

I think that the best way would be to download the whole site to search for PDFs this way.

Oleg.
MJenny
09/30/2005 02:30 am
Hello Oleg,

thanks.

The method works, but its drawback is that it`s one page address out of several thousands.
I would need to find all of them out first and then add them to the url list.

Ideally the download should start from this page
http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
which is one level up and lists all cases I`m interested in.

However, starting from that address, I`m unable to retreive any pdfs.
MJenny



Oleg Chernavin
09/30/2005 03:07 am
OK. Change that Project`s URL to:

http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII

Increase Level to 2, click OK button and redownload the Project.

Oleg.
MJenny
09/30/2005 05:04 pm
Hi again,

unfortunaetly that doesn`t work. That was also my first try.

MJenny



> OK. Change that Project`s URL to:
>
> http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
>
> Increase Level to 2, click OK button and redownload the Project.
>
> Oleg.
Oleg Chernavin
10/03/2005 03:28 am
Can you please select that Project, click the Copy button on toolbar and then paste it in the forum message? I will see what is wrong with the Project settings.

Oleg.
MJenny
10/04/2005 02:58 am
Ok, here it is:

[Object]
OEVersion=Pro 3.8.0.2028
Type=0
IID=7119
Caption=TEST Schriftliche Anfragen 22.GP
URL=http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
Lev=2
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3pdf ooooooooooooooooooox
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejar
FTUDef.Exts=jscssssivbsdtdxslswf
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,0,3,0
RProt=127
LastStart=126:23:103:122:48:220:226:64:
LastEnd=29:101:254:126:48:220:226:64:
S200=1
SPar=1
SSav=1
SLast=200
SSiz=1184165
LFiles=1
LSize=1184165
ImgDim=0,0,0,0
PrevURL=http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII

MJenny
Oleg Chernavin
10/04/2005 11:46 am
You simply need to remove PDF extension from text and add it to User Defined. Then uncheck the whole Text category (keeping all extensions in Text checked).

Oleg.