Download pages depth-first
|Keith Mason||10/19/2004 12:12 pm|
|I have a quick question: is it possible to download a website depth-first? For example, if I have a webpage a.htm, and it contains links to b.htm, m.htm, and s.htm, then b.htm contains links to g.htm and h.htm, OE is going to load in this order: a, b, m, s, g, h. I want to mimic the way a user would manually download, e.g., a->b->g & h, then back-up to a to get m & s (which each have their own children).
The way this would work within OE would be to place URLs extracted from a page at the top of the queue instead of at the bottom. To make it work perfectly, I would also have to download one page at a time and insert a delay between pages so that there was time to parse.
The reason I need it is that there are some websites that offer CGI links that create the requested content in a temporary file, but as soon as you hit the next CGI link, the previous temporary file gets deleted. OE handles one case of this automatically: when the CGI returns a redirect. But if the CGI instead returns a page with a link, and accessing another page invalidates the previous page`s link, I can`t download it.
Is there already an option to cause new URLs to go to the top? Or is there another way to make this happen?
|Oleg Chernavin||10/20/2004 01:24 am|
Yes, there is such option. You need to select your Project, click the Properties button on toolbar and add the following line to the URLs field: