How do you download only those pages with "TITLE:" in the text?

Len Lydik
10/21/2004 06:17 pm
How do you download only those pages with "TITLE:" in the text?
10/21/2004 10:40 pm
> How do you download only those pages with "TITLE:" in the text?

How would you use this filter?

It is possible that OEP saves only pages on your HDD which contain "TITLE:" in its content.
Use the "Content Filters".

But AFAIK you can *not* avoid that OE downloads and parses files with other content (they are downloaded by OE, but they are not saved on disk).
I guess that you are searching for another filter:

"Do not follow any links on pages which haven`t "TITLE:" in its content
AND
Save only pages on disk which have "TITLE:" in its content"

That would be a really useful filter.

Perhaps Oleg can implement such a filter?
Oleg Chernavin
10/22/2004 04:31 am
Content Filters already support this. Place TITLE: in the keywords field and keep all checkboxes in the section unchecked. This will load all pages and save only those that contain the keyword.

Best regards,
Oleg Chernavin
MP Staff
10/22/2004 06:47 am
> Content Filters already support this.

What exactly? Len said that he wants to *download* only the pages that contain "TITLE:" in the text.
He didn`t say anything about *saving* files on disk; Len could clarify this...

> Place TITLE: in the keywords field and keep all checkboxes in the section unchecked.
> This will load all pages and save only those that contain the keyword.

This is the same as I said before. "This will load all pages..." -> this should be avoided (in most cases).
The filter doesn`t work like this:

"Do not follow any links on pages which haven`t "TITLE:" in its content
AND
Save only pages on disk which have "TITLE:" in its content"

Or am I wrong?

I think that this filter could be very useful. Do you agree?
Could you add such a filter type to the "Content Filters"?

What do you think?

Thank you!
Oleg Chernavin
10/22/2004 06:52 am
Do you mean to follow links, which tag contains TITLE:, like:

<a href="somepage.htm">TITLE: aaa bbb</a>

Is it what you need?

Oleg.
Len Lydik
10/22/2004 12:10 pm
Oleg`s solutions seems to be working. I`m just looking to save only the files with the string "TITLE:" in the content.

I don`t know how OEP would accomplish this without downloading all files (how could it analyze a file it hasn`t downloaded).
10/22/2004 06:32 pm
OK. Maybe I wasn`t clear enough. I will describe it once more.

> (how could it analyze a file it hasn`t downloaded).

Of course OE can`t analyze a file without downloading it.

> I don`t know how OEP would accomplish this without downloading all files

"all files" -> That`s the thing: OE shouldn`t download *all* unwanted files (in most cases), only the files that are absolutely necessary to analyze. And this is what my filter should do. OE wouldn`t follow (wouldn`t download) any links on pages which haven`t "TITLE" in their content. In this way OE would download unwanted files only *one* level deeper than the wanted files. But currently OE would download *every* page at any level.

I try to explain it in an example:

-------------------------------
Level 0, page A:
Has TITLE in its content
Links in page A:
B
C
D

Level 1, page B:
Has TITLE in its content
Links in page B:
E
F
G

Level 1; page C:
Has *not* TITLE in its content
Links in page C:
H
I
J

Level 1; page D:
Has *not* TITLE in its content
Links in page D:
K
L
M

Pages on Level 2 which have TITLE in their content:
E, F, G

Pages on Level 2 which have *not* TITLE in their content:
H, I, J, K, L, M
-------------------------------

The result with the current filter:
OE downloads:
A, B, C, D, E, F, G, H, I, J, K, L, M

OE saves on disk:
A, B, E, F, G

---

The result with my filter:
OE downloads:
A, B, C, D, E, F, G

OE saves on disk:
A, B, E, F, G

---

Compare the results:
With my filter OE does *not* download the pages H, I, J, K, L, M

I think that both filters are useful (the current and my suggestion).
For example, my filter would be useful when you want to follow a specific chapter on a site (like the example above).

@ Oleg

I hope that you agree and you can implement such a filter.

> Do you mean to follow links, which tag contains TITLE:, like:
>
> <a href="somepage.htm">TITLE: aaa bbb</a>
>
> Is it what you need?

It`s not the filter that I mentioned. But indeed, I have looked for such a filter before, without any positive result.
I`m pretty sure that many people would love to have such a filter, me too. ;-)

So, this would be the third filter in this topic.
I hope that you can realize the 2 new filters. :-)

Thanks in advance!
Oleg Chernavin
10/25/2004 12:35 pm
OK. I see, so you need a new checkbox, like "Do not follow links in pages that do not contain the above keywords". Is it so?

Oleg.