Just wondering if you are going to support this?
Robots.txt and <META = robots> tags.
I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.
Thanks,
Korshi
Thank you.
Best regards,
Oleg Chernavin
MetaProducts corp.
> So far, our users didn`t express the need in obeying the Robots standard. But if you want, I can do a version for you that would support robots.txt. If you are interested, please write us directly to support@metaproducts.com
>
> Thank you.
>
> Best regards,
> Oleg Chernavin
> MetaProducts corp.
> Yes, I`m intrested!
>
>
> > So far, our users didn`t express the need in obeying the Robots standard. But if you want, I can do a version for you that would support robots.txt. If you are interested, please write us directly to support@metaproducts.com
> >
> > Thank you.
> >
> > Best regards,
> > Oleg Chernavin
> > MetaProducts corp.
Thanks
> Hi there,
>
> Just wondering if you are going to support this?
> Robots.txt and <META = robots> tags.
>
> I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.
>
> Thanks,
>
> Korshi
OE can "learn" and update itself with restricted file/directory filters it should not get into.
More info: http://www.robotstxt.org/wc/robots.html
---
Michael.
> Sorry for my ignorance, but what does a Robot.txt do? How does it differ from the system that OE now uses? Any URLs which I can read up more on?
>
> Thanks
>
> > Hi there,
> >
> > Just wondering if you are going to support this?
> > Robots.txt and <META = robots> tags.
> >
> > I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.
> >
> > Thanks,
> >
> > Korshi
http://www.server.com/
Additional=LoadRobots=http://www.server.com/robots.txt
This way, OE would load robots.txt file first and then would obey its rules for the rest of files on that server.
Would that work?
Oleg.
In my perfect world, there would be a checkbox on each project - obey Robots exclusion or not, but that`s a great start. All robots.txt are stored on the root server directory (with a name of "robots.txt") - so i`m not sure the complete absolute path is necessary.
BTW - will you support the <META tag for robots as well?
Thanks Oleg,
==
Michael.
> Would it be enough, if Offline Explorer will contain this option per each Project, so you would have to enable it for Projects where you need such features? For example, it could be this way:
>
> http://www.server.com/
> Additional=LoadRobots=http://www.server.com/robots.txt
>
> This way, OE would load robots.txt file first and then would obey its rules for the rest of files on that server.
>
> Would that work?
>
> Oleg.
Oleg.
will we see that feature soon, is it implemented on 2.9 ?
Thanks,
Michael.
> This will be in the URLs field of each Project - if you add the line, it works with robots.txt files, otherwise - not.
>
> Oleg.
How urgent is that feature for you? I had plans to add it to 3.0 version, but if it is really important for you to have it right now, I can try to reschedule my plans.
Oleg.
Also, do you plan to support <META robot> as well?
Thanks.
> Michael,
>
> How urgent is that feature for you? I had plans to add it to 3.0 version, but if it is really important for you to have it right now, I can try to reschedule my plans.
>
> Oleg.
We hope for November this year.
> Also, do you plan to support <META robot> as well?
I will try.
Oleg.