Content Filters don't work with certain sites

The GermRod
04/05/2010 07:23 pm
Hi!

Offline Explorer Pro's Content Filters feature does not work with certain Web sites. Here are two examples.

I created a project which checks TMZ for updates. When a new story is posted, the project should go bold in the Projects tab. The filter text is a piece of code that only appears in the print version of the page. OE does not detect said filter:

[Object]
OEVersion=Pro 5.8.0.3174
Type=0
IID=7360
Caption=TMZ
URL=http://www.tmz.com/Channels=1Additional=DisableScripts;DisableJava;SkipIFrames;DoNotParseExistingFiles
MVer=5
Lev=1
When=5
Minute=3
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=3
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=ooxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RSrvsBx=1
RPathIn=2010 x
RFileIn=print x
RProt=255
LastStart=131:23:125:149:57:170:227:64:
LastEnd=166:218:128:149:57:170:227:64:
LastStarted=4/5/2010 7:11:16 PM
LastEnded=4/5/2010 7:11:16 PM
S200=1
S304=26
SPar=1
SSav=1
SLast=304
SSiz=130866
SMdf=1
LFiles=27
LSize=26975
Flags=1
CFKeywords="show-post-print-style.css"
CFFlags=32
ImgDim=0,0,0,0
PrevURL=http://www.tmz.com/
ConvertRSS=True
IPAddr=-1990673203
LIndexed=False
IndexFiles=False

====================================================

This project should go bold when a 5.0+ earthquake happens, but gives false positives because the filter feature doesn't work properly:

[Object]
OEVersion=Pro 5.8.0.3174
Type=0
IID=7349
Caption=USGS
URL=http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_big.phpChannels=1
MVer=5
Lev=1
When=5
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=3
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RPathIn=recenteqsww/quakes x
RProt=255
LastStart=247:107:0:131:57:170:227:64:
LastEnd=214:147:13:131:57:170:227:64:
LastStarted=4/5/2010 7:08:01 PM
LastEnded=4/5/2010 7:08:02 PM
S304=35
SPar=35
SLast=304
LFiles=35
Flags=1
CFKeywords="Latest Earthquakes M5.0+ in the World - Past 7 days"
CFFlags=83
ImgDim=0,0,0,0
PrevURL=http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_big.php
ConvertRSS=True
IPAddr=2122227916
LIndexed=False
IndexFiles=False

There plenty of sites where the filters work, and when they do, it works flawlessly. But filters fail on a large number of sites, as well.

(btw, I e-mailed support regarding this problem about 3 weeks ago from the same e-mail I listed when posting this message)
Oleg Chernavin
04/06/2010 04:47 am
I am not sure what is wrong with the first project - it looks like it doesn't save the files correctly.

The second one is simple. It always downloads the first page and this page makes the Project become "bold". The server reports it as always changed. I would suggest you to check the box "Check file size" in the Properties dialog. This will make the project "bold" only when the page actually changes.

I am sorry for the missing E-mail. It looks like modern anti-spam filters block so many letters nowdays. Forums are reliable.

Best regards,
Oleg Chernavin
MP Staff
The GermRod
04/20/2010 07:24 am
Hi Oleg!

This is a very simple project that doesn't do anything useful but exemplifies the problem.

OE is unable to detect the word "reuters" on Reuters' home page. (It's on there a whole bunch of times.) In fact, it can't detect ANY letter, number, or space (" ").

It would be great if this bug was fixed for OEP 6 or hopefully sooner... :)

[Object]
OEVersion=Pro 5.8.0.3174
Type=0
IID=7388
Caption=Reuters
URL=http://www.reuters.com/Channels=1Additional=DisableScripts
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RProt=255
LastStart=177:156:92:236:251:171:227:64:
LastEnd=2:148:94:236:251:171:227:64:
LastStarted=4/19/2010 8:56:32 PM
LastEnded=4/19/2010 8:56:32 PM
SPar=1
SLast=200
SSiz=48709
LSize=13744
CFKeywords=reuters
ImgDim=0,0,0,0
PrevURL=http://www.reuters.com/
ConvertRSS=True
IPAddr=1747455552
LIndexed=False
IndexFiles=False

By comparison, this second project works exactly as intended: It doesn't download the first page, only the print versions of the articles that the page links to using code that appears only in the print pages. By never saving the first page, the project only goes bold when a new article is posted regardless of whether the first changes size or the server reports it as having changed - and this is what I was trying to do with the first two projects I posted.

Again, there are many sites that OE handles well, but also many that OE can't handle.

[Object]
OEVersion=Pro 5.8.0.3174
Type=0
IID=7870
Caption=AFP
URL=http://www.breitbart.com/content.php?regsrc=AFP&max=20Channels=1Additional=DisableScripts;DisableJava;SkipIFrames;DoNotParseExistingFiles
MVer=5
Lev=1
When=5
Weekday=257
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=3
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=gifjpgjpegtiftiffxbmfifbmppngipxjp2j2cj2kwbmplwf
FTVideo.Exts=mpgavianimpegmovflvfliflcvivrmramrvasfasxwmvm1vm2vvobsmilmp4
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaape
FTArchive.Exts=ziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexe
FTUDef.Exts=jscssssivbsdtdxslswfclassent
FTText.B=ooxooo
FTImages.B=xoxooo
FTVideo.B=xoxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,3,0
NotIgnoreLogout=False
RFileIn=cngtx- xx
RProt=255
LastStart=145:47:181:45:253:171:227:64:
LastEnd=247:84:196:45:253:171:227:64:
LastStarted=4/19/2010 9:53:02 PM
LastEnded=4/19/2010 9:53:02 PM
S304=40
SPar=1
SLast=304
LFiles=40
LSize=29121
CFKeywords="color: #000000;"
CFFlags=32
SubstsB=KglhcnRpY2xlLnBocAlwcmludC5waHANCg==
ApplyAllSubsts=True
ImgDim=0,0,0,0
PrevURL=http://www.breitbart.com/content.php?regsrc=AFP&max=20
ConvertRSS=True
IPAddr=-1194255294
LIndexed=False
IndexFiles=False
Oleg Chernavin
04/20/2010 07:25 am
Thank you for the sample! I fixed the problem. Please update your oe.exe file to the new version:

http://www.metaproducts.com/download/betas/OEP3194.ZIP

Oleg.
The GermRod
04/23/2010 07:47 pm
It works great now!

Thanks.
Oleg Chernavin
04/25/2010 04:42 am
You are welcome!

Oleg.