I use OE a lot for news. I put news sites in the URL and use a level of 1 with a timer of about 10 minutes to download newly published articles. The website below does not obey the "Skip existing files on levels higher than 1" command and re-downloads articles that have already been downloaded. When I search for "Recently loaded files only" in the the "Find Contents" dialog, old news articles will be mixed in with new results. Please help.
[Object]
OEVersion=Pro 7.0.4408
Type=0
IID=8212
Caption=DNAInfo
URL=http://www.dnainfo.com/new-york/index/allhttp://www.dnainfo.com/new-york/Additional=DisableScripts;DisableJava;SkipIFrames;donotparseexistingfilesChannels=1
MVer=5
Lev=1
When=5
Minute=10
Weekday=257
FMGroup=3
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwfwebp
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4m4v
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaapeoggm4a
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jsaxdcssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0,0,0,0,0,0,0,0
NotIgnoreLogout=False
RPathIn=/201 x
RProt=255
LastStart=135:152:135:123:220:175:228:64:
LastEnd=44:37:163:123:220:175:228:64:
PrjStart=211:163:51:212:97:147:228:64:
LastStarted=12/28/2015 9:21:42 PM
LastEnded=12/28/2015 9:21:43 PM
S200=4
S304=42
SPar=4
SSav=4
SLast=304
SSiz=361845
SMdf=4
SHTML=3
SSuccDowns=708
LFiles=46
LSize=105534
Flags=1
ImgDim=0,0,0,0
PrevURL=http://www.dnainfo.com/new-york/index/all
ConvertWWW=False
One idea is to duplicate this Project - select it, press Ctrl+C, then Ctrl+V and delete the first copy of it. Would this make a change in its download behavior?
Best regards,
Oleg Chernavin
MP Staff
I solved the problem: I used URL Substitutes to rename downloaded filenames to *.htm locally.
My problem was some existing files were being downloaded again, despite "Skip above 0" option being set (I wrote "Skip Above 1" by mistake in the original post, but it is correct in the project)
Since the pages have no extension (.htm or similar), OE was creating a directory with the name of the page and putting a "default.htm" file in said directory. OE would not recognize that the file had already been downloaded, and would download them again. This only happened to one or two pages, for some reason. (bug?)
Thanks.
Oleg.
c:\download\www.dnainfo.com\new-york\20151027\greenwich-village\watch-harlem-globetrotters-stomp-team-up-on-greenwich-village-courts
c:\download\www.dnainfo.com\new-york\20151103\richmond-hill\gunman-threatens-chinese-food-employee-over-chicken-wing-combo-worker-says
c:\download\www.dnainfo.com\new-york\20151112\lower-east-side\man-wanted-for-rape-attempted-assaults-manhattan-police-say
c:\download\www.dnainfo.com\new-york\20160114\central-harlem\13-things-do-your-manhattan-neighborhood-this-weekend
c:\download\www.dnainfo.com\new-york\20160114\park-slope\open-house-agenda-3-top-floor-apartments-see-this-weekend\default.htm
c:\download\www.dnainfo.com\new-york\20160114\park-slope\open-house-agenda-3-top-floor-apartments-see-this-weekend\slideshow\683873
c:\download\www.dnainfo.com\new-york\20160114\tompkinsville\5-things-for-you-do-staten-islands-neighborhoods-this-weekend
c:\download\www.dnainfo.com\new-york\20160114\upper-west-side\what-its-like-be-black-civil-war-re-enactor\default.htm
c:\download\www.dnainfo.com\new-york\20160114\upper-west-side\what-its-like-be-black-civil-war-re-enactor\slideshow\683972
c:\download\www.dnainfo.com\new-york\20160114\west-harlem\8-ways-commemorate-martin-luther-king-jr-day-city
c:\download\www.dnainfo.com\new-york\20160115\bed-stuy\condo-prices-fall-18-percent-bed-stuy-bushwick-crown-heights-report
c:\download\www.dnainfo.com\new-york\20160115\bed-stuy\decrease-bed-stuy-shootings-for-2015-is-unprecedented-nypd-chief-says
c:\download\www.dnainfo.com\new-york\20160115\brooklyn-heights\hidden-cocktail-bar-inspired-by-marie-antoinette-opens-brooklyn\default.htm
c:\download\www.dnainfo.com\new-york\20160115\brooklyn-heights\hidden-cocktail-bar-inspired-by-marie-antoinette-opens-brooklyn\slideshow\684298
c:\download\www.dnainfo.com\new-york\20160115\brownsville\4-of-5-suspects-released-brownsville-playground-rape-case
c:\download\www.dnainfo.com\new-york\20160115\bushwick\6-new-cafs-bars-gyms-check-out-greenpoint-wburg-bushwick
c:\download\www.dnainfo.com\new-york\20160115\central-harlem\charles-rangel-not-impressed-with-candidates-running-replace-him
c:\download\www.dnainfo.com\new-york\20160115\central-harlem\meet-candidates-running-replace-charles-rangel-congress
c:\download\www.dnainfo.com\new-york\20160115\downtown-brooklyn\1066-foot-tall-skyscraper-could-rise-downtown-brooklyn
c:\download\www.dnainfo.com\new-york\20160115\midtown\14-subway-lines-slated-for-service-changes-this-weekend
c:\download\www.dnainfo.com\new-york\20160115\midtown\man-steals-entire-essie-nail-polish-collection-from-duane-reade-police-say
c:\download\www.dnainfo.com\new-york\20160115\park-slope\raccoon--rat-infested-trash-pile-defeated-by-persistent-park-slopers
c:\download\www.dnainfo.com\new-york\20160115\upper-east-side\cast-your-vote-on-what-central-park-statue-should-be-sculpted-out-of-ice
c:\download\www.dnainfo.com\new-york\default.htm
All the files ending in "default.htm" will be downloaded again every time the project runs even though I don't want them to be downloaded again (I only want newly published news articles). It looks like it has something to do with the subdirectory "slideshow" being created.
The fix is URL substitutes
Apply to Filename
URL: *2016*
Replace *
with *.htm
Hope this helps.
http://www.metaproducts.com/download/betas/opsetup.exe
Oleg.