So I had a few inquiries about filters.
I think there should be an exclude wildcard, such that if I write
It will match any two characters except ab or ef
Unless there's a work-around I didn't see?
I needed something similar in one of my project such that:
A website had page classed into categories. The url looked like special:ancientpages or special:users.
What I wanted to do was to download the whole website, while downloading only the categories I wanted.
So I a filter like special:[#analysis:#Theory] would exclude all the categories except analysis and theory.
Sometimes when you download a website, you end up with copies of the same page like so:
and so on. I tried to use url substitutes in the parsing to no avail.
The only way I "managed" the problem was to create multiple substitutes, for each numbers from 1 to n
like -1.html is replaced to .html, then -2.html is replaced with .html....til I felt satisfied that I had enough(5)
2. You could use substitutes rule, like:
Howrever it looks strange to me that it downloads such copies. Could it be because of links to such files on the site? Can you give me the site URL and let me know where I can see such links, on which pages?
The page are generated by php, so I'm not too sure how it ends up giving -x links
But they are correct and lead to different valuable contents.