Problem downloading from specific site

Author Message
doonyakka 09/15/2004 08:42 pm
This should be a simple operation, but I can`t get OE to download PDFs from http://semanticsarchive.net

I`m setting no level limit, and load files only within starting server, but it won`t even fetch
http://semanticsarchive.net/Archive/TVlNGI4M/adgerramchand.pdf

Any ideas what`s going wrong here?

Cheers,
doonyakka
09/15/2004 09:44 pm
> Any ideas what`s going wrong here?

Try this:

----------
Project Properties:

Uncheck "Level limit"
Do not download existing files

File filters:
Text:
Load only from the starting server

Other:
Load using URL filters settings

URL filters:
Server:
Load files only within the starting: Server

Directory:
Load from all directories

Filename:
Custom filenames configuration

View included files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
pdf
----------

HTH
doonyakka 09/15/2004 10:09 pm
> > Any ideas what`s going wrong here?
>
> Try this:
>
[snip]
----------
>
> HTH
>

Using those settings doesn`t make any difference for me, unfortunately. It still only executes browse.pl and downloads a default.htm file with links to the Archive and its various subdirectories (where all the PDFs are!). Then it stops.

Is it working for you?

Thanks for your help.

doonyakka
09/15/2004 11:16 pm
> Is it working for you?

Yes. One of the first pdf downloads is:
http://semanticsarchive.net/Archive/2ExNzlkZ/ladusaw.salt4.pdf
(look in the Map list)

To be safe you can try the following:

Right-click on you Project. Delete... Only Project files

Project URLs:
http://semanticsarchive.net/

Have you really unchecked the Level limit?

Download All files
(You can change this download setting afterwords (to whatever is the best for you...))

File filters:
Check "Text":
Is "pdf" in your Extensions list and is it unchecked? Then please change it.

Check "Other":
Load using URL filters settings
(Uncheck "Maximum..." and "Minimum...")

You can check the following category:
User Defined

Uncheck all other categories (Images, ...)

Filename:
Custom filenames configuration

View included files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
pdf

(You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)

Please look if you have some keywords in "View excluded files keywords".
Please delete all keywords in this section.

If this still doesn`t work for you, then post your Project settings:

Mark your project.
Click on the Copy button.
Paste the setting in a new message.
09/16/2004 01:02 am
Even if it should be clear, I will add some words:

> File filters:
> Check "Text":
> Is "pdf" in your Extensions list and is it unchecked?
If the answer is "yes":
> Then please change it.

Perhaps you want to download more files from the server:
*.doc, *.ps, *.rtf, ...

Please add the desired extensions to the "View included keywords" list.
doonyakka 09/16/2004 06:22 am
> If this still doesn`t work for you, then post your Project settings:
>
> Mark your project.
> Click on the Copy button.
> Paste the setting in a new message.

It`s still not working, unfortunately. Apart from the `URL Filters - Filenames` keywords you suggested, these are the settings I`ve had all along:

[Object]
OEVersion=Pro 3.3.0.1788
Type=1
IID=7010
Caption=http://semanticsarchive.net/
URL=http://semanticsarchive.net/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmpdfpsdoc oooooooooxoooooxxx
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppng
FTVideo.Exts=mpgavianimpegmovfliflcviv
FTAudio.Exts=wavriffmp3midmp2m3u
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakace
FTUDef.Exts=jscss
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
RSrvsBx=1
RFileBx=2
RFileEx=http://semanticsarchive.net/*pdfpstxtdoc xxxxx
RProt=63
LastStart=30:63:203:236:206:172:226:64:
LastEnd=4:144:218:236:206:172:226:64:
S200=1
SPar=1
SSav=1
SLast=200
SSiz=2285
LFiles=1
LSize=2285
Flags=1
ImgDim=0,0,0,0
PrevURL=http://semanticsarchive.net/

Thanks for your help!
09/16/2004 08:21 am
> It`s still not working, unfortunately.

Yes, with your settings it can`t be done.

> Apart from the `URL Filters - Filenames` keywords you suggested,
> these are the settings I`ve had all along:

Was this the setting you had before? Otherwise:

1.
Why do have unchecked nearly all extensions in
File Filters: Text ?

I think, it`s never a good idea to exclude so many standard file types. Please enable all of them, you don`t get much junk, but I won`t recommend to put the pdf, doc, and ps extensions in this list. Please delete them. Don`t uncheck them, delete them! You can put them in the "View included files keywords" list.
You can uncheck some extensions, if you are really sure that you don`t need them and they don`t contain some important links.

2.
You haven`t enabled File Filters... Other

3.
And now a frequent fault, although I mentioned it twice:

| View included files keywords:
| browse.pl?sortbydate
| browse.pl?sortbyauthor
| http://semanticsarchive.net/archive/*
| pdf
|
| (You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)
|
| Please look if you have some keywords in "View excluded files keywords".
| Please delete all keywords in this section.

You mixed up the "View *included* files keywords" and "View *excluded* files keywords" categories.

Please add the following keywords in the "View *included* files keywords" and delete all keywords in "View *excluded* files keywords".

View *included* files keywords:
browse.pl?sortbydate
browse.pl?sortbyauthor
http://semanticsarchive.net/archive/*
pdf
ps
doc
rtf

You don`t have to put "txt" in here, because the txt extension should be one of the standard enabled Text extensions in File Filters | Text.

| (You can delete "browse.pl?sortbydate" or "browse.pl?sortbyauthor" if you don`t need it)

But please: Don`t delete both of them.

You have http://semanticsarchive.net/* in your keywords list; please replace it:
http://semanticsarchive.net/archive/* in "View *included* files keywords".

If you will get a xml error message when you browse your project (in "browse.pl?sortbydate" or "browse.pl?sortbyauthor"), then I would have a short solution, but I don`t know why this happens (and I`m not familiar with those codes).

I hope that I haven`t forgot to mention anything...
You know: If it still don`t work:

| Mark your project.
| Click on the Copy button.
| Paste the setting in a new message.

HTH
doonyakka 09/16/2004 11:49 am
I feel like an idiot. I promise I`m not normally this stupid! ;) I`ve checked and double-checked my settings, but it`s still not working. Sorry...

[Object]
OEVersion=Pro 3.3.0.1788
Type=1
IID=7010
Caption=http://semanticsarchive.net/
URL=http://semanticsarchive.net/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=1
FTText.Exts=htmlhtmaspjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfm xxxxxxxxxxxxxxx
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppng
FTVideo.Exts=mpgavianimpegmovfliflcviv
FTAudio.Exts=wavriffmp3midmp2m3u
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakace
FTUDef.Exts=jscss
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
RSrvsBx=1
RFileBx=2
RFileIn=docpdfpshttp://semanticsarchive.net/archive/*browse.pl?sortbydate xxxxx
RProt=63
LastStart=198:124:4:15:214:172:226:64:
LastEnd=229:109:28:18:214:172:226:64:
S200=11
SPar=11
SSav=11
SLast=200
SSiz=646448
SMdf=10
LFiles=11
LSize=757940
Flags=1
ImgDim=0,0,0,0
PrevURL=http://semanticsarchive.net/
09/16/2004 05:37 pm
> I feel like an idiot.

There are no reasons for it.

> I promise I`m not normally this stupid! ;-)

Of course, I`m pretty sure. And this has nothing to do with stupidity. ;-)

> I`ve checked and double-checked my settings, but it`s still not working.

Do you remember what I did say?

| I hope that I haven`t forgot to mention anything...

And I did...
By checking and double-checking your settings you could also have seen the fault. ;-)

Quote from my first posting:

| File filters:
| Text:
| Load only from the starting server

Please correct that.

I really hope that it works now.

You Know: If not, ...

;-)
doonyakka 09/16/2004 06:54 pm
> I really hope that it works now.
>
> You Know: If not, ...
>
> ;-)

It worked! Thank you for your patience; I don`t know if I`d have been so tolerant. Why did the `File Filters - Text - Load only from starting server setting` make so much difference? I`ll try to learn to interpret those logs, maybe I won`t have to bother you again! Sorry it took so long...
09/16/2004 10:56 pm
> It worked!

Hey, that`s good! :-)

> Thank you for your patience; I don`t know if I`d have been so tolerant.

No problem, nobody is perfect...

> Why did the `File Filters - Text - Load only from starting server setting` make so much difference?

The "problem" are the directories under "http://semanticsarchive.net/Archive/" like this:
http://semanticsarchive.net/Archive/zY5MDY5Y

zY5MDY5Y looks like a file without extension, but it must also be handled as a directory that has to be parsed:
http://semanticsarchive.net/Archive/zY5MDY5Y/

With the mentioned keywords in "View included files keywords" and the Text File filter "Load using URL filters settings" zY5MDY5Y isn`t handled as you want. The download stops.