how to download www.howstuffworks.com

Author Message
Florin 08/27/2004 04:06 pm
Could you please help me in downloading http://computer.howstuffworks.com/ coz i don`t know how to set the filters so i won`t end up with unnecesary junk on my hdd... Thank in advance!
08/28/2004 10:21 am
> Could you please help me in downloading http://computer.howstuffworks.com/

There seems to be a lot of unuseful stuff on the server. I will give you an example of a project setting that should avoid most waste. You have the option to finetune the settings.

---------------
Project:

uncheck "Level Limit"

URLs:
http://computer.howstuffworks.com/

-----

File Filters:

Text:
Load using URL Filters settings

Images:
Add "swf" to the Extensions list (There are Flash files on the sever)
Load from any site

User Defined:
Load only from the starting server

-----

URL Filters:

Server:
Load only within the starting Server

Directory:
Custom directories configuration

View included directories keywords:
http://computer.howstuffworks.com/

View excluded directories keywords:
http://computer.howstuffworks.com/computer*
http://computer.howstuffworks.com/community*

Filename:
Custom filenames configuration

View excluded files keywords:
rate.htm
parent=
gadget*.htm
bookstore-gateway
newsletter*.htm
news-item*.htm
book*.htm

(You can delete or add some file filters. I don`t know your preferences.
For example: cp-*.htm or link-archive.htm)

---------------

HTH
Florin 08/28/2004 11:36 am
Thank`s, i`ll try that!
Florin 08/28/2004 12:37 pm
I`ve tried that, but it wont download at all.... I`ll tell you what files are in the "MAP" area, maybe that could help you help me :) :

+ -- static.howstuffworks.com
|__+gif -- a bounch of gifs files contained in the howstuffworks.com/index.html
+ -- www.howstuffwork.com
| |__+gif -- just 2 files..
|___+javascript -- main.js
|___default.htm
|___index.htm
|___space.gif

Hope u can understand that schematic... But why are only those files downoalded?? I have unchecked the "level limit"

The program seems to work fine, coz i`m downloading another project (with the default setings, except for a "load from another server" limit set to "3")..
Please help! again! ;)
08/28/2004 01:46 pm
I`ve made another short test (I don`t want to download the whole stuff... ;-)
It seems to work fine.

The best way to see what`s wrong with your settings:
Mark your project in the project tree.
Click on the "Copy" button.
Paste your settings in a new message.
Florin 08/28/2004 02:09 pm
" Paste your settings in a new message."
Ok, here it is:

[Object]
OEVersion=Pro 3.2.0.1708
Type=0
IID=7015
Caption=http://www.howstuffworks.com/
URL=http://www.howstuffworks.com/
Lev=1000001
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwfswf xxxxxxxxxxxxxxxx
FTVideo.Exts=mpgavianimpegmovfliflcvivrmramrvasfasxwmvm1vm2vvob
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejar
FTUDef.Exts=jscssssivbsdtdxslswf
FTText.B=oooooo
FTImages.B=oooooo
FTVideo.B=oooooo
FTAudio.B=oooooo
FTArchive.B=oooooo
FTUDef.B=oooooo
FTOther.B=oooooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,3,0,3,0
RSrvsBx=1
RPathBx=2
RPathIn=http://computer.howstuffworks.com/computer*http://computer.howstuffworks.com/community* xx
RPathEx=http://computer.howstuffworks.com/ x
RFileBx=2
RFileIn=printableversion x
RFileEx=rate.htmparent=gadget*.htmbookstore-gatewaynewsletter*.htmnews-item*.htmbook*.htm xxxxxxx
RProt=63
LastStart=194:21:139:217:121:170:226:64:
LastEnd=230:128:251:5:122:170:226:64:
S200=1
S304=49
SPar=2
SSav=1
SLast=304
SSiz=67230
SMdf=1
LFiles=50
LSize=67230
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.howstuffworks.com/
08/28/2004 03:25 pm
It seems like you have made some mistakes:

Please start a new project and insert my above mentioned settings.

The starting url is:
http://computer.howstuffworks.com/
(see your first message)
Your wrong setting was: http://www.howstuffworks.com/

Do not download existing files.

Uncheck (File Filters):
Video
Audio
Archive
Other


URL Filters:

Directoy:
View included directories keywords:
http://computer.howstuffworks.com/

View excluded directories keywords:
http://computer.howstuffworks.com/computer*
http://computer.howstuffworks.com/community*

You have mixed up the "included" and "excluded" settings.

Filename:
View included files keywords:
Remove all filters in the "included" category

I assume that all "x.." in the filters are there, because the binary file can`t be transferred correctly through a simple message (otherwise: uncheck them).

HTH
Florin 08/28/2004 03:36 pm
Indeed my mistake, but i realy want to download XXX.howstuffwork.com, where XXX is substitute for coputer and all the other categories. I`ll try adding the xxx.howstuffwork to the filters too... Thx very much for your help!
08/28/2004 05:09 pm
> i realy want to download XXX.howstuffwork.com,

Oh, that will get a really large project, perhaps more than 1 GB. That`s a (nearly) complete other thing.
But I hope, that you have now an idea on how you have to filter the files. You will have to add some filters, especially to the included and excluded directories keywords. If you want to download the whole categories, I would also delete the "gadget*.htm" filter, replace "book*.htm" with "^book*.htm" etc. I`m sure, you will find the best settings. Always keep an eye on the download process, so that no junk comes on your HD.

Good luck!
Oleg Chernavin 08/28/2004 06:18 pm
Hello,

I just saw this discussion. I hope that you will be able to download the site. If all above doesn`t help, I will do my best to assist you. Just le me know.

Best regards,
Oleg Chernavin
MP Staff
babu 03/31/2005 09:56 am
> Hello,
>
> I just saw this discussion. I hope that you will be able to download the site. If all above doesn`t help, I will do my best to assist you. Just le me know.
>
> Best regards,
> Oleg Chernavin
> MP Staff


Hello Oleg Chernavin,

Can you plse help me to download this website?

Regards,
Babu
Oleg Chernavin 03/31/2005 10:27 am
OK. Please let me know what went wrong when you tried to download it.

Oleg.
babu 04/06/2005 04:26 am
I want to download only the printable links of all the articles listed in howstuffworks.com/big.htm
I managed somehow to get the printable URL of all the articles and downloaded too. But whenever i opened the article offline, some script error (I think java script error) is coming and thats irritating me.

I want to have some predefined program/template which has the ability to download only the printable links of all the articles without any error and to update frequently automatically?

thanks
babu
Oleg Chernavin 04/08/2005 08:12 am
I would suggest you to do it the following way:

Create a new Project in Offline Explorer Pro with the URL:

http://www.howstuffworks.com/big.htm

Open its Properties dialog.

Set Level to 1. URL Filters | Server - select Load from the starting Domain.

Go to Advanced. Click the URL Substitutes button. Add a new rule:

URL:
*
Replace:
.htm
With:
.htm/printable

Click OK button and another OK button. Start loading the Project. Does this work?

Oleg.