So I had a few inquiries about filters.
Question #1
I think there should be an exclude wildcard, such that if I write
[#ab:#ef]
It will match any two characters except ab or ef
Unless there's a work-around I didn't see?
I needed something similar in one of my project such that:
A website had page classed into categories. The url looked like special:ancientpages or special:users.
What I wanted to do was to download the whole website, while downloading only the categories I wanted.
So I a filter like special:[#analysis:#Theory] would exclude all the categories except analysis and theory.
Question #2
Sometimes when you download a website, you end up with copies of the same page like so:
www.website.com/php?title=subject.html
www.website.com/php?title=subject-1.html
www.website.com/php?title=subject-2.html
and so on. I tried to use url substitutes in the parsing to no avail.
The only way I "managed" the problem was to create multiple substitutes, for each numbers from 1 to n
like -1.html is replaced to .html, then -2.html is replaced with .html....til I felt satisfied that I had enough(5)
2. You could use substitutes rule, like:
Replace:
-*.html
With:
.html
Howrever it looks strange to me that it downloads such copies. Could it be because of links to such files on the site? Can you give me the site URL and let me know where I can see such links, on which pages?
Thank you!
Best regards,
Oleg Chernavin
MP Staff
The page are generated by php, so I'm not too sure how it ends up giving -x links
http://artofproblemsolving.com/wiki/index.php/2003_AIME_I_Problems/Problem_1
http://artofproblemsolving.com/wiki/index.php/2003_AIME_I_Problems/Problem_2
But they are correct and lead to different valuable contents.
Oleg.