I`m setting no level limit, and load files only within starting server, but it won`t even fetch
http://semanticsarchive.net/Archive/TVlNGI4M/adgerramchand.pdf
Any ideas what`s going wrong here?
Cheers,
doonyakka
Try this:
----------
Project Properties:
Uncheck "Level limit"
Do not download existing files
File filters:
Text:
Load only from the starting server
Other:
Load using URL filters settings
URL filters:
Server:
Load files only within the starting: Server
Directory:
Load from all directories
Filename:
Custom filenames configuration
View included files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
----------
HTH
>
> Try this:
>
[snip]
----------
>
> HTH
>
Using those settings doesn`t make any difference for me, unfortunately. It still only executes browse.pl and downloads a default.htm file with links to the Archive and its various subdirectories (where all the PDFs are!). Then it stops.
Is it working for you?
Thanks for your help.
doonyakka
Yes. One of the first pdf downloads is:
http://semanticsarchive.net/Archive/2ExNzlkZ/ladusaw.salt4.pdf
(look in the Map list)
To be safe you can try the following:
Right-click on you Project. Delete... Only Project files
Project URLs:
http://semanticsarchive.net/
Have you really unchecked the Level limit?
Download All files
(You can change this download setting afterwords (to whatever is the best for you...))
File filters:
Check "Text":
Is "pdf" in your Extensions list and is it unchecked? Then please change it.
Check "Other":
Load using URL filters settings
(Uncheck "Maximum..." and "Minimum...")
You can check the following category:
User Defined
Uncheck all other categories (Images, ...)
Filename:
Custom filenames configuration
View included files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
(You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)
Please look if you have some keywords in "View excluded files keywords".
Please delete all keywords in this section.
If this still doesn`t work for you, then post your Project settings:
Mark your project.
Click on the Copy button.
Paste the setting in a new message.
> File filters:
> Check "Text":
> Is "pdf" in your Extensions list and is it unchecked?
If the answer is "yes":
> Then please change it.
Perhaps you want to download more files from the server:
*.doc, *.ps, *.rtf, ...
Please add the desired extensions to the "View included keywords" list.
>
> Mark your project.
> Click on the Copy button.
> Paste the setting in a new message.
It`s still not working, unfortunately. Apart from the `URL Filters - Filenames` keywords you suggested, these are the settings I`ve had all along:
[Object]
OEVersion=Pro 3.3.0.1788
Type=1
IID=7010
Caption=http://semanticsarchive.net/
URL=http://semanticsarchive.net/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmpdfpsdoc oooooooooxoooooxxx
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppng
FTVideo.Exts=mpgavianimpegmovfliflcviv
FTAudio.Exts=wavriffmp3midmp2m3u
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakace
FTUDef.Exts=jscss
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
RSrvsBx=1
RFileBx=2
RFileEx=http://semanticsarchive.net/*pdfpstxtdoc xxxxx
RProt=63
LastStart=30:63:203:236:206:172:226:64:
LastEnd=4:144:218:236:206:172:226:64:
S200=1
SPar=1
SSav=1
SLast=200
SSiz=2285
LFiles=1
LSize=2285
Flags=1
ImgDim=0,0,0,0
PrevURL=http://semanticsarchive.net/
Thanks for your help!
Yes, with your settings it can`t be done.
> Apart from the `URL Filters - Filenames` keywords you suggested,
> these are the settings I`ve had all along:
Was this the setting you had before? Otherwise:
1.
Why do have unchecked nearly all extensions in
File Filters: Text ?
I think, it`s never a good idea to exclude so many standard file types. Please enable all of them, you don`t get much junk, but I won`t recommend to put the pdf, doc, and ps extensions in this list. Please delete them. Don`t uncheck them, delete them! You can put them in the "View included files keywords" list.
You can uncheck some extensions, if you are really sure that you don`t need them and they don`t contain some important links.
2.
You haven`t enabled File Filters... Other
3.
And now a frequent fault, although I mentioned it twice:
| View included files keywords:
| browse.pl?sortbydate
| browse.pl?sortbyauthor
| http://semanticsarchive.net/archive/*
|
| (You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)
|
| Please look if you have some keywords in "View excluded files keywords".
| Please delete all keywords in this section.
You mixed up the "View *included* files keywords" and "View *excluded* files keywords" categories.
Please add the following keywords in the "View *included* files keywords" and delete all keywords in "View *excluded* files keywords".
View *included* files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
ps
doc
rtf
You don`t have to put "txt" in here, because the txt extension should be one of the standard enabled Text extensions in File Filters | Text.
| (You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)
But please: Don`t delete both of them.
You have http://semanticsarchive.net/* in your keywords list; please replace it:
http://semanticsarchive.net/archive/* in "View *included* files keywords".
If you will get a xml error message when you browse your project (in "browse.pl?sortbydate" or "browse.pl?sortbyauthor"), then I would have a short solution, but I don`t know why this happens (and I`m not familiar with those codes).
I hope that I haven`t forgot to mention anything...
You know: If it still don`t work:
| Mark your project.
| Click on the Copy button.
| Paste the setting in a new message.
HTH
[Object]
OEVersion=Pro 3.3.0.1788
Type=1
IID=7010
Caption=http://semanticsarchive.net/
URL=http://semanticsarchive.net/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfm xxxxxxxxxxxxxxx
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppng
FTVideo.Exts=mpgavianimpegmovfliflcviv
FTAudio.Exts=wavriffmp3midmp2m3u
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakace
FTUDef.Exts=jscss
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
RSrvsBx=1
RFileBx=2
RFileIn=docpdfpshttp://semanticsarchive.net/archive/*browse.pl?sortbydate xxxxx
RProt=63
LastStart=198:124:4:15:214:172:226:64:
LastEnd=229:109:28:18:214:172:226:64:
S200=11
SPar=11
SSav=11
SLast=200
SSiz=646448
SMdf=10
LFiles=11
LSize=757940
Flags=1
ImgDim=0,0,0,0
PrevURL=http://semanticsarchive.net/
There are no reasons for it.
> I promise I`m not normally this stupid! ;-)
Of course, I`m pretty sure. And this has nothing to do with stupidity. ;-)
> I`ve checked and double-checked my settings, but it`s still not working.
Do you remember what I did say?
| I hope that I haven`t forgot to mention anything...
And I did...
By checking and double-checking your settings you could also have seen the fault. ;-)
Quote from my first posting:
| File filters:
| Text:
| Load only from the starting server
Please correct that.
I really hope that it works now.
You Know: If not, ...
;-)
>
> You Know: If not, ...
>
> ;-)
It worked! Thank you for your patience; I don`t know if I`d have been so tolerant. Why did the `File Filters - Text - Load only from starting server setting` make so much difference? I`ll try to learn to interpret those logs, maybe I won`t have to bother you again! Sorry it took so long...
Hey, that`s good! :-)
> Thank you for your patience; I don`t know if I`d have been so tolerant.
No problem, nobody is perfect...
> Why did the `File Filters - Text - Load only from starting server setting` make so much difference?
The "problem" are the directories under "http://semanticsarchive.net/Archive/" like this:
http://semanticsarchive.net/Archive/zY5MDY5Y
zY5MDY5Y looks like a file without extension, but it must also be handled as a directory that has to be parsed:
http://semanticsarchive.net/Archive/zY5MDY5Y/
With the mentioned keywords in "View included files keywords" and the Text File filter "Load using URL filters settings" zY5MDY5Y isn`t handled as you want. The download stops.