I have a problem with a site`s pdf files. The site changed from doc-Format to pdf-Format in midcourse. I was able to retrieve the doc files with url macros without a problem, but not a single pdf has been downloaded. The file type pdf was added but it doesn`t work. The only way I can get the pdfs downloaded is by explicitly giving the complete path and file name. However, I can` do that a thousand times.
See for an example:
http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_02331/fname_030651.pdf
I want all pdfs lingering in the directories below the J-Level. Until J_00055 they used winword docs with an identical name (daten_000000.doc), since then the pdf name changes each time because of the counter included. The doc files were downloaded with this macro SINGLEURL=http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0000{:0..9}/daten_000000.doc
This one unfortunately didn`t work out for the pdfs:
SINGLEURL=http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0000{:56..99}/
Hope you can help.
Thanks in advance
MJenny
You can try to load all possible URLs:
http://www.parlament.gv.at/pls/portal/docs/page/PG/DE/XXII/J/J_0{:0000..9999}/fname_0{:00000..99999}.pdf
But this will take too much time. Are there any Web pages that contain links to PDF files?
Best regards,
Oleg Chernavin
MP Staff
thanks for the very quick response!
Here is an example of an page with 2 pdf-Links included: http://www.parlament.gv.at/portal/page?_pageid=908,964749&_dad=portal&_schema=PORTAL
MJenny
What about adding a success counter to url macros?
E.g., you know there is a file with a file name in the second macro range 1000-2000, but you don`t know the exact number. So the macro starts running through the addresses until it hits upon a valid address. (make the number of valid addresses sought a user variable). Now stop the counter for the second macro, add 1 to the first macro and run through the range for the second macro until you score again, and so on. A hierarchy of macro counters would be of help here.
MJenny
http://www.parlament.gv.at/portal/page?_pageid=908,964749&_dad=portal&_schema=PORTAL
I added PDF extension to the File Filters | User Defined section and changed its Location to Load from any site. Level=1 - I loaded all PDFs well.
I think that the best way would be to download the whole site to search for PDFs this way.
Oleg.
thanks.
The method works, but its drawback is that it`s one page address out of several thousands.
I would need to find all of them out first and then add them to the url list.
Ideally the download should start from this page
http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
which is one level up and lists all cases I`m interested in.
However, starting from that address, I`m unable to retreive any pdfs.
MJenny
http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
Increase Level to 2, click OK button and redownload the Project.
Oleg.
unfortunaetly that doesn`t work. That was also my first try.
MJenny
> OK. Change that Project`s URL to:
>
> http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
>
> Increase Level to 2, click OK button and redownload the Project.
>
> Oleg.
Oleg.
[Object]
OEVersion=Pro 3.8.0.2028
Type=0
IID=7119
Caption=TEST Schriftliche Anfragen 22.GP
URL=http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
Lev=2
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3pdf ooooooooooooooooooox
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejar
FTUDef.Exts=jscssssivbsdtdxslswf
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,0,3,0
RProt=127
LastStart=126:23:103:122:48:220:226:64:
LastEnd=29:101:254:126:48:220:226:64:
S200=1
SPar=1
SSav=1
SLast=200
SSiz=1184165
LFiles=1
LSize=1184165
ImgDim=0,0,0,0
PrevURL=http://www.parlament.gv.at/portal/page?_pageid=908,98803&_dad=portal&_schema=PORTAL&P_NR=XXII
MJenny
Oleg.