Downloading only certain filenames

Author Message
Pablo 12/14/2006 07:37 am
Hello, I wanted to know how to spider a complete site, but download (store) only certain filenames, like "news.php?Id=10000", "news.php?Id=10001", etc.

In this case, they are news sites, where every news has an Id number. I want to store only the news files, but not all the others.

As I understand, if I set filters to store only "news.php" filenames, the site will not be correctly spidered, because files like index.php and other similars will not be downloaded.

How can I do this?

Thanks,
Pablo.
Oleg Chernavin 12/14/2006 07:45 am
Well, the URLs field of the Project supports DeleteAfterParsing= command, but you will have to tell it to remove the files of almost all combinations.

Best regards,
Oleg Chernavin
MP Staff
Pablo 12/14/2006 08:09 am
So there is no other solution than inserting as many DeleteAfterParsing as kinds of files exist in the site?

Don't you think it would be interesting to have a StoreOnlyFilenames=file command?

Another nice solution would be to have a "StoreOnlyIncludedFiles" comand. In this case, you can specify in the "included files" section the list of files to store. But ALL would be parsed.

I think my need is not very weird... in all the sites I download, I see that there are many "glue" files with no interesting information in them. The interesting information is only on certain files like "filename?ID=xxxxx", where xxxxx is the news number...

Thank you very much,
Pablo.



> Well, the URLs field of the Project supports DeleteAfterParsing= command, but you will have to tell it to remove the files of almost all combinations.
>
> Best regards,
> Oleg Chernavin
> MP Staff
Oleg Chernavin 12/14/2006 08:32 am
I understand, but we didn't plan this yet.

Oleg.
Pablo 12/14/2006 12:12 pm
OK, Oleg.
I hope in the near future...

Thanks.

> I understand, but we didn't plan this yet.
>
> Oleg.
Oleg Chernavin 12/14/2006 12:25 pm
Well, there is one other way - to use Content Filters - if these news.xxx pages contain some kind of unique words...

Oleg.