Setting up Content Filters
|Stefan||09/01/2016 07:10 am|
I'm trying to download a website and skip (don't save) the files that contain these two strings:
<TITLE>302 File moved</TITLE>
I've set the "Content Filters" dialogue in Project Properties like this:
-- Text keywords
Keywords: "<title> 404" "<TITLE>302 File moved</TITLE>"
Search for all keywords: checked
Search inside HTML tags: checked
-- When keywords are not found in the page
Save these pages: checked
All the other checkboxes are left unchecked.
My assumption is that works like this:
1) Page is downloaded
2) Parser searches for "<title> 404" and "<TITLE>302 File moved</TITLE>"
3) If none of these is found, page is saved, otherwise it's discarded
However, even with these settings, the pages containing "<TITLE>302 File moved</TITLE>" are still saved, so I guess it works in a different way. Can you please help me with finding out where my settings are wrong?
|Oleg Chernavin||09/01/2016 07:13 am|
You need to uncheck the Search for all keywords box. Because when it is checked, Offline Explorer requires both of these words to be present in a single web page. And as I understand, only one of these lines can be in a web page.
So, it is either 302 or 404.
Would this work?
|Stefan||09/01/2016 11:00 am|
|Thank you Oleg, I'll try it out.
However, I find this dialogue (especially the "Search for all keywords" option) counter-intuitive from the user standpoint. I'll try to explain.
Let's say I have two strings that I want to filter out. So I put them in the "Keywords" field and then check "Search for all keywords", because that's what I want to do - search for all of them and only do the action if none is found. Then I go to "When keywords are not found in a page" and check "Save these pages".
So in my thinking, I checked the options that mean "search for all keywords and when none of them is found, save page". However, it doesn't work that way, which might be slightly confusing.
I think it would be way more intuitive if there were two separate checkboxes - "Only apply when all keywords are found" in "When keywords are found in a page" and "Only apply when none of the keywords are found" in "When keywords are not found in a page".
Or, even better, a radio buttons to switch between logical AND and logical OR - so it would be possible to choose between "Only when all keywords are found" and "When at least one keyword is found" in "When keywords are found in a page" and between "Only when none of the keywords are found" and "When at least one keyword is not found" in "When keywords are not found in a page".
That way, it would be easy to understand what the filter logic is going to do and how it will apply the rules.
Just a suggestion :)
Thanks again for helping me out.
|Oleg Chernavin||09/26/2016 07:08 pm|
|Yes, I like that it gives more logic and flexibility.
I added these options. Can you please take a look at the updated version:
Please let me know if it is OK or anything should be improved/fixed. Thank you!