I`d like to download all of the pages in a website plus only some of the pages tht the site links to (i.e., only the linked pages that contain the word "Bush").
I`d then like OEP to add new qualifying pages to the project daily, weekly or monthly without removing saved pages even if those saves pages have been removed from their websites.
It seems that I`m doing two downloads:
1. The entire website with only pages on the website`s server regardless of content; and,
2. Linked pages on other servers down to two levels IF they have the word "Bush" in them.
a. Where the content of the linked page contains the word "Bush"; and,
b. Even if the page in the starting website does not contain that word.
For example, these pages are part of the starting website ("has" refers to content):
1. (Website page has "bush") AND (Linked page has "Bush") - Download
2. (Website page has "bush") AND (Linked page NOT have "Bush") - No Download
3. (Website page NOT have "bush") AND (Linked page has "Bush") - Download
4. (Website page NOT have "bush") AND (Linked page Not have "Bush") - No Download
> Do you want to load the pages that contain "Bush" in links, like http://www.server.com/page_Bush.html or in the pages contents?
> Best regards,
> Oleg Chernavin
> MP Staff
How important this kind of download for you? If it is really important, I will work on the redesign of Contents Filters.
A. Presently, I find the content filtering choices ambiguous and hard to use:
1. Search for all keywords. ~~Will the condition be met if ANY of the keywords is present, or only if ALL of them are present?
2. Save pages with no keywords in their text.~~All pages will be saved except that any pages that have even one keyword will not be saved.
3. Do not save pages with keywords in their text.~~Isn’t this the same as the one immediately above?
4. Download graphics files for pages with no keywords in their text?~~I have no idea what this means.
5. Stop downloading when keywords are encountered.~~Does this mean abort the download as soon as one keyword is found? Does this mean pause the download each time a keyword is found? Functionally, how would this be used differently than 2 or 3? When and why would someone use this?
B. I’d like to be able to have OEP download or skip a page depending upon whether the text on the linked page has the following (i.e., “a”, “b”, “c”, etc. represent words):
a OR b OR c
a AND b
a /5 b (i.e. a and b are within 5 words of each other, in any order)
a +5 b (i.e. a and b are within 5 words of each other, but b is after a)
“a x b y c z” (i.e. the phrase “a x b y c z”)
a NOT b (i.e., this would be the least important type of filter)
C. Important: Aside from the syntax of the filters, I’d like to download all of certain websites and some but not all of the pages they link to and I think that this would be very useful for people in many fields. Examples:
1. One user might want to download the entire Cancer Society site plus the linked pages that deal with liver cancer (but only the linked pages that deal with liver cancer).
2. Another user might want to download four entire political websites and also download (only) those linked pages that deal with Iraq.
I would think that this approach would be useful for many people. Maybe, on the URL filtering section, the portion that says “Load files only within the starting server....” and “Load up to ## links on other servers.” could be followed with: “Load only linked pages that contain one or more of the following words/phrases: ______________________”.
> > Oleg.
I like your product but I think it`ll be my dream software if following features are available:
1. Is there anyway to fully analyze the link, eg. I want to download some pages within the same server, but I want to download the link point to other server if the content of the link itself is called "Complete Story" (only that link, no further downloading on that site)? Like in www.linuxtoday.com, I want to download the page <a href="http://www.pocketpcthoughts.com/index.php?action=expand,41356">Complete Story</a> even though it doesn`t come from www.linuxtoday.com. I tried your content filtering but cannot make this work.
2. Same for linuxtoday site, I`m only interested in the link in the main body of the home page. I don`t want to download those tabs which contain Preference, Search, Contact Us at the left tab or those like "Editor`s Picks" at the right tab even though they are from same server?
3. "Level limit" is not the ideal way to control how much to download, if we set smaller number of "level limit", then a lot useful pages will not be downloaded, however, if we set bigger number of "level limit", then a lot unrelevant pages are downloaded as well. It will be ideal if we can follow the human way to download pages, just imagine I`m browsing a web forum, normally I click all links in the main body of the first page, then I click "Next Page", browse the links in the main body again, and then "Next Page" until I click n times of "Next Page". In this case I don`t lose anything, yet I don`t include those unrelevant pages.
Just my 2 cents,
to allow or disallow such links.
The 3rd wish is not easy to do. I will think about it.