Filter at level X
|Doceave||09/05/2008 03:08 pm|
I have dug in the forums... and i''m still lost-
I have a site.. many thousands of pages even when set to download only two levels- I badly, however, need to do a third level but need to have the filter made tighter at this point to avoid hitting the 10^25 number of pages!!
Is there a way I can download to two levels, then change the filters such that OE skips through the existing files and only downloads the next level where the URL contains a certain word? For example: "image", "audio", "video" ??
Thanking you in anticipation
|Oleg Chernavin||09/06/2008 07:29 am|
|Sorry, this is not yet possible.
|Doceave||09/06/2008 05:31 pm|
Could data mining be used here? -- How about downloading to 2 levels, then parsing the downloaded content and saving all the URL''s contained into a single list (mining if I understand correctly), then downloading this extensive list by an additional level with the tighter filters??
Please please help..
|Oleg Chernavin||09/06/2008 05:46 pm|
|Yes, if you manage to extract the necessary links, it is possible to save them in a text file and create a Project with a URL like:
This will load the links from c:\file.txt. Links should be one per line in this file.
|Dylan Eave||09/07/2008 03:36 am|
I''m battling with TextPipe to extract the URL''s as the downloaded files made by OE as don''t have traditional URLs... I.E. Textpipe is not picking them up-
Can you advise me on a way to mine the vast numbers of URL''s in my OE download folder?
Thanks yet again... (This problem is almost solved)
|Oleg Chernavin||09/07/2008 04:44 am|
|It is hard for me to suggest. TextPipe is a very complex tool. It is powerful and flexible, but tough to learn. I almost haven''t worked with it. If you stuck, can you provide me with more details on what you are downloading. I will look at it and maybe I will have better suggestions.
It is hard to tell without looking at particular sites.
|Oleg Chernavin||09/07/2008 06:03 am|
|If you need only images downloaded from anywhere, I could suggest you the following - set the filters as you need to allow downloading from Wikipedia only and allow File Filters - Images to be loaded from any site. Level would be 3.
|Dylan Eave||09/07/2008 06:52 am|
My problem with 3 levels is the vast number of pages that are downloaded... 2 levels keeps the downloads relevant to topic and allows for rapid updates. Also, the images are on html pages and not links directly to the image files...
I really need to figure out how to parse out the URL''s and download only further URL''s containing "image:"
Thanks (again again) again
|Oleg Chernavin||09/07/2008 08:18 am|
|Please show me examples of pages that you want to download and links that go to level 3.
|Dylan||09/13/2008 03:49 am|
|> Please show me examples of pages that you want to download and links that go to level 3.
Hi Oleg... back again..
I went a figured out TextPipe... Amazingly usefull program: Managed to generate a google sitemap, extract the urls, then filter to select URL''s containing image files only...
Anywho.. now I have a 1MB text file with URL''s pointing to images only. So my question now..
If I set OE to download these images, will they placed into the propper folder structure that already exists such that links from the exported site will actually point to the images?
|Oleg Chernavin||09/13/2008 05:30 am|
|Yes, sure. You can feed Offline Explorer this list of image URLs and it will download and place them correctly. Links from other Projects will be working.
If you want to export them all together - select the Project that downloaded the site and the images Project with Ctrl+click and then do the export.