How to avoid downloading same page with different urls?

Author Message
Mercie 03/03/2006 06:05 am
I think I am recieving a huge problem. So this site I`m downloading changes its urls every few hours from:

http://laskdjflsdf.urltosite.com/watever.html

to

http://23lkjasldkf.urltosite.com/watever.html

These two are exactly identical pages with just different urls. How do I avoid it downloading the same pages over and over.
Oleg Chernavin 03/03/2006 06:11 am
What about URL Substitutes to place all files in the same server?

URL:
http://*urltosite.com/
Replace:
http://*urltosite.com/
With:
http://www.urltosite.com/

Uncheck the rule.

Best regards,
Oleg Chernavin
MP Staff
Mercie 03/03/2006 06:38 am
That would work, but I forgot to mention that It would also parse the same pages over and over. How would I stop it from parsing the same pages over and over?

Because I think it`s still parsing the same pages with just different urls over and over.

Example of what I mean is:

It parses http://324lkj23l.sitetourl.com/watever.html
then it also parses

http://kl23jl23.sitetourl.com/watever.html

Which are identical files, but the url keep on changing every few hours meaning it will keep parsing the same urls over and over.

How do I stop it from parsing the same url over and over? My queue list has over 2,000,000 urls and I know for sure that the site doesn`t have that many.
Oleg Chernavin 03/03/2006 09:11 am
I am afraid, there is no solution right now. Sorry.

Oleg.