Trouble downloading Washington Post pages
|Marc C||02/04/2004 01:28 am|
My organization is trying to download opinion/editorials from the Washington Post such as this:
This is 1 level down from a Yahoo! Opinion & Editorial archive with a link like this:
As you can see, the Wash Post`s web server converts the Yahoo! link to a dynamically generated page (I think), which doesn`t seem to get followed by OE 2.9. My map for www.washingtonpost.com looks like this:
+[wp-adv] (advertisement stuff)
As you can see, [ac2] never gets populated, even though ultimately that is the folder on WashPost`s server where the html file resides.
Any help is much appreciated.
|Oleg Chernavin||02/04/2004 04:48 am|
I followed the second link, but Yahoo told me that there is no such page. Can you tell me a link to all Yahoo.com Opinions and Editorials?
|Marc C||02/05/2004 02:22 pm|
|Oleg, please try:
and click on one of the Washington Post links.
> I followed the second link, but Yahoo told me that there is no such page. Can you tell me a link to all Yahoo.com Opinions and Editorials?
> Best regards,
> Oleg Chernavin
> MP Staff
|Oleg Chernavin||02/06/2004 07:40 am|
|OK. You need to make the change to the Project configuration - set Level to 2, because Yahoo contains a link to a non-existing page on WashingtonPost, which redirects to the actual article.
I would also suggest you to use URL Filters | Filename | Custom configuration to add two keywords to the Included filename keywords:
This will filter exactly the pages you want to download.