Wildcards/Regular Expressions

User Forums
Offline Explorer Pro
Wildcards/Regular Expressions

Author

Message

AlexBaldwin

02/23/2017 09:25 am

On a side note: Damn, Customer service is fast!

So I had a few inquiries about filters.

Question #1
I think there should be an exclude wildcard, such that if I write
[#ab:#ef]
It will match any two characters except ab or ef

Unless there's a work-around I didn't see?
I needed something similar in one of my project such that:
A website had page classed into categories. The url looked like special:ancientpages or special:users.
What I wanted to do was to download the whole website, while downloading only the categories I wanted.
So I a filter like special:[#analysis:#Theory] would exclude all the categories except analysis and theory.

Question #2
Sometimes when you download a website, you end up with copies of the same page like so:
www.website.com/php?title=subject.html
www.website.com/php?title=subject-1.html
www.website.com/php?title=subject-2.html
and so on. I tried to use url substitutes in the parsing to no avail.
The only way I "managed" the problem was to create multiple substitutes, for each numbers from 1 to n
like -1.html is replaced to .html, then -2.html is replaced with .html....til I felt satisfied that I had enough(5)

Oleg Chernavin

02/23/2017 09:38 am

1.The regexp is quite limited there. What if you would use the Included list instead? Specify analysis and theory and everything else would be excluded.

2. You could use substitutes rule, like:

Replace:
-*.html
With:
.html

Howrever it looks strange to me that it downloads such copies. Could it be because of links to such files on the site? Can you give me the site URL and let me know where I can see such links, on which pages?

Thank you!

Best regards,
Oleg Chernavin
MP Staff

AlexBaldwin

02/24/2017 01:38 pm

Hi, the website I am trying to download is http://artofproblemsolving.com/wiki/index.php/Main_Page
The page are generated by php, so I'm not too sure how it ends up giving -x links

Oleg Chernavin

02/26/2017 05:23 pm

I downloaded the site with Level=2 and didn't see such links. I found many like:

http://artofproblemsolving.com/wiki/index.php/2003_AIME_I_Problems/Problem_1
http://artofproblemsolving.com/wiki/index.php/2003_AIME_I_Problems/Problem_2

But they are correct and lead to different valuable contents.

Oleg.

Wildcards/Regular Expressions

MetaProducts Systems Privacy Practices

Personal Information

Web Tracking Information

Information Security and Quality

Business Relationship

Cookies

Requests for Information and Legal Requirements

MetaProducts Systems Web Site Copyright

MetaProducts Systems End User License Agreement

TRADEMARKS

IMPORTANT: PLEASE READ THIS AGREEMENT CAREFULLY BEFORE USING THE SOFTWARE.

END USER LICENSE AGREEMENT

LICENSE OF UNREGISTERED SOFTWARE

LICENSE OF REGISTERED SOFTWARE

DISTRIBUTION OF UNREGISTERED SOFTWARE

TERM OF LICENSE

ACCEPTANCE OF THIS LICENSE AGREEMENT

LIMITATIONS OF USE

DISCLAIMER OF WARRANTY AND LIABILITY

OTHER RESTRICTIONS

INVALID PROVISIONS

ENTIRE AGREEMENT

GOVERNING LAW

MetaProducts Systems Terms of Use

TERMS OF USE

COPYRIGHT

MetaProducts Systems Trademarks