Robots.txt and

Author Message
Korshi 03/29/2006 07:02 am
Hi there,

Just wondering if you are going to support this?
Robots.txt and <META = robots> tags.

I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.

Thanks,

Korshi
Oleg Chernavin 03/29/2006 07:02 am
So far, our users didn`t express the need in obeying the Robots standard. But if you want, I can do a version for you that would support robots.txt. If you are interested, please write us directly to support@metaproducts.com

Thank you.

Best regards,
Oleg Chernavin
MetaProducts corp.
Korshi 03/29/2006 07:02 am
Yes, I`m intrested!


> So far, our users didn`t express the need in obeying the Robots standard. But if you want, I can do a version for you that would support robots.txt. If you are interested, please write us directly to support@metaproducts.com
>
> Thank you.
>
> Best regards,
> Oleg Chernavin
> MetaProducts corp.
Michael 03/29/2006 07:02 am
I would love the same feature as well. OE E as a crawler performs wonderfully.

> Yes, I`m intrested!
>
>
> > So far, our users didn`t express the need in obeying the Robots standard. But if you want, I can do a version for you that would support robots.txt. If you are interested, please write us directly to support@metaproducts.com
> >
> > Thank you.
> >
> > Best regards,
> > Oleg Chernavin
> > MetaProducts corp.
03/29/2006 07:02 am
Sorry for my ignorance, but what does a Robot.txt do? How does it differ from the system that OE now uses? Any URLs which I can read up more on?

Thanks

> Hi there,
>
> Just wondering if you are going to support this?
> Robots.txt and <META = robots> tags.
>
> I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.
>
> Thanks,
>
> Korshi
Michael 03/29/2006 07:02 am
Robots.txt and [meta=robots] tags are URL exclusion standards that will allow OE to update itself only with files and url permitted by the server.

OE can "learn" and update itself with restricted file/directory filters it should not get into.

More info: http://www.robotstxt.org/wc/robots.html

---
Michael.

> Sorry for my ignorance, but what does a Robot.txt do? How does it differ from the system that OE now uses? Any URLs which I can read up more on?
>
> Thanks
>
> > Hi there,
> >
> > Just wondering if you are going to support this?
> > Robots.txt and <META = robots> tags.
> >
> > I`m trying to use offline explorer as a spider for about 50 websites that feeds a search engine with data. I would like this feature.
> >
> > Thanks,
> >
> > Korshi
Oleg Chernavin 03/29/2006 07:02 am
Would it be enough, if Offline Explorer will contain this option per each Project, so you would have to enable it for Projects where you need such features? For example, it could be this way:

http://www.server.com/
Additional=LoadRobots=http://www.server.com/robots.txt

This way, OE would load robots.txt file first and then would obey its rules for the rest of files on that server.

Would that work?

Oleg.
Michael 03/29/2006 07:02 am
It`s a great start, but where are Additional=LoadRobots command line would need to be run? as a parameter when running OE.exe? or can it be set for each project individually?

In my perfect world, there would be a checkbox on each project - obey Robots exclusion or not, but that`s a great start. All robots.txt are stored on the root server directory (with a name of "robots.txt") - so i`m not sure the complete absolute path is necessary.

BTW - will you support the <META tag for robots as well?

Thanks Oleg,

==
Michael.

> Would it be enough, if Offline Explorer will contain this option per each Project, so you would have to enable it for Projects where you need such features? For example, it could be this way:
>
> http://www.server.com/
> Additional=LoadRobots=http://www.server.com/robots.txt
>
> This way, OE would load robots.txt file first and then would obey its rules for the rest of files on that server.
>
> Would that work?
>
> Oleg.
Oleg Chernavin 03/29/2006 07:02 am
This will be in the URLs field of each Project - if you add the line, it works with robots.txt files, otherwise - not.

Oleg.
Michael 03/29/2006 07:02 am
Hi Oleg,

will we see that feature soon, is it implemented on 2.9 ?

Thanks,

Michael.

> This will be in the URLs field of each Project - if you add the line, it works with robots.txt files, otherwise - not.
>
> Oleg.
Oleg Chernavin 03/29/2006 07:02 am
Michael,

How urgent is that feature for you? I had plans to add it to 3.0 version, but if it is really important for you to have it right now, I can try to reschedule my plans.

Oleg.
Michael 03/29/2006 07:02 am
No need, let`s wait for version 3.0. When is it scheduled btw?
Also, do you plan to support <META robot> as well?

Thanks.

> Michael,
>
> How urgent is that feature for you? I had plans to add it to 3.0 version, but if it is really important for you to have it right now, I can try to reschedule my plans.
>
> Oleg.
Oleg Chernavin 03/29/2006 07:02 am
> No need, let`s wait for version 3.0. When is it scheduled btw?

We hope for November this year.

> Also, do you plan to support <META robot> as well?

I will try.

Oleg.