http://web.archive.org/web/20080703083241/http://ciadvertising.org/
I have succeeded in downloading 203,770 files with 4.014 GB size.
1. When I try to view past the link from the fist page in Offline Explorer Pro browser, it launches IE (current version) and cannot find the files offline (these links are set to be opened in a new window in the html). I have checked the links, and they all reside on my harddrive but cannot be displayed. IE just shows 170.... and keeps churning away with no error message.
2. I have tried to follow the recommendations from Oleg to client Naomi in this forum of 12/5/2010 entitled "Please help me with re-constructing a site from Wayback Machine!!! garage-door-specialists.co.uk"
3. Here are the settings I have made for this download in Offline Explorer Pro:
(This site, http://www.ciadvertising.org, was downloaded to Internet Archive from 2001-2009 so there are many copies there)
URL Filters
URL Exclusions:
http://web.archive.org/web/2001*/*/
http://web.archive.org/web/2002*/*/
http://web.archive.org/web/2003*/*/
http://web.archive.org/web/2004*/*/
http://web.archive.org/web/2005*/*/
http://web.archive.org/web/2006*/*/
http://web.archive.org/web/2007*/*/
http://web.archive.org/web/2009*/*/
http://www.utexas.edu/
Server:
checked load only within this server
staticweb.archive.org
www.utexas.edu
http://web.archive.org/web/2001*/*/
http://web.archive.org/web/2002*/*/
http://web.archive.org/web/2003*/*/
http://web.archive.org/web/2004*/*/
http://web.archive.org/web/2005*/*/
http://web.archive.org/web/2006*/*/
http://web.archive.org/web/2007*/*/
http://web.archive.org/web/2009*/*/
http://www.utexas.edu/
Directory:
unchecked Load files only from starting directory and below
Filename:
nothing done here--used default values
Parsing:
setup rule to remove numbers and and unchecked to apply to files
(I did not do this quite correctly (will re-run) as the numbers were not replaced. Did the test on this rule and it works to remove numbers (dates of download on wayback machine) from files:
URL:
http://web.archive.org/web/*www.ciadvertising.org
Replace:
http://web.archive.org/web/*/
I greatly appreciate your help as I thought this site was lost forever and represents my life work as an academician (let alone my students' work). As you may know, the recommended download program by Internet Archive site no longer works with the changed wayback machine for downloading, and they indicate it will not work until after August 2011.
john l
BTW, why when I attempt to open a .gif, e.g., from offline downloaded content in Photoshop, to verify it is on my haddrive, I get an message saying it cannot open the format?
I will do the download and try to see what is wrong.
Regarding GIF files. Yes, the site uses lots of redirects when you request a URL, it points you to another timed version of a file. So, many of the downloaded files are small HTML pages with redirections.
You may open them to see the exact location of the GIF and other such files.
Best regards,
Oleg Chernavin
MP Staff
I am trying to do the same thing. Is there anyway to get it so when I export the files they go into one directory instead of all into to date stamped folders?
Many thanks
URL:
http://web.archive.org/web/*
Replace:
http://web.archive.org/web/**/*
With:
http://web.archive.org/web/*
Apply to:
Filenames
Then redownload the project and export it.
Oleg.
Thanks that worked. Only problems is I'm getting 1000's of files and pages from years that I don't want. One site is archived for 2007 and I have this in URL exceptions:
http://web.archive.org/web/2001*/*/
http://web.archive.org/web/2002*/*/
http://web.archive.org/web/2003*/*/
http://web.archive.org/web/2004*/*/
http://web.archive.org/web/2005*/*/
http://web.archive.org/web/2006*/*/
but it still downloads pages from older years than 2007.
Many thanks for your help.
Oleg.
[Object]
OEVersion=Pro 6.0.0.3658
Type=0
IID=7025
Caption=http://web.archive.org/web/20070819071002/http://www.domain.co.uk/
URL=http://web.archive.org/web/20070819071002/http://www.domain.co.uk/
MVer=5
Lev=10
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
LTMethod=1
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwfwebp
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4m4v
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaapeoggm4aaif
FTArchive.Exts=7zziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexeiso
FTUDef.Exts=jsaxdcssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=ooxooo
FTArchive.B=ooxooo
FTUDef.B=ooxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RPathIn=www.domain.co.uk x
RProt=255
LastStart=210:95:174:224:153:241:227:64:
LastEnd=172:231:125:225:153:241:227:64:
LastStarted=28/10/2011 19:24:29
LastEnded=28/10/2011 19:24:38
S200=9
SAbr=101
SPar=5
SSav=9
SLast=200
SSiz=267152
SMdf=8
SHTML=9
SSuccDowns=1
LFiles=9
LSize=347024
Stopped=True
Flags=1
SubstsB=aHR0cDovL3dlYi5hcmNoaXZlLm9yZy93ZWIvKglodHRwOi8vd2ViLmFyY2hpdmUub3JnL3dlYi8qKi8qCWh0dHA6Ly93ZWIuYXJjaGl2ZS5vcmcvd2ViLyoJWA0K
ImgDim=0,0,0,0
PrevURL=http://web.archive.org/web/20070819071002/www.domain.co.uk/
SkipURLs=http://web.archive.org/web/2001*/*/http://web.archive.org/web/2002*/*/http://web.archive.org/web/2003*/*/http://web.archive.org/web/2004*/*/http://web.archive.org/web/2005*/*/http://web.archive.org/web/2006*/*/
ConvertRSS=True
Exported=28/10/2011 19:10:24 - D:\directory\domain\
LIndexed=False
IndexFiles=False
Cheers,
Tim
Oleg.