File filters and URL filters
|jondank||10/01/2011 04:42 pm|
|I am trying to download files from an internal company website. I only want to dowload files containg text such as MS Word, Powerpoint, Excel. I added these file types to the use File filters list.
I do not want files ending in .htm .aspx and other Web files. I find that if I unchecl .aspx, the download stops at one file. It seems as of the filters are applying to the crawling process and not justthe download process. I also tried URL filters using keywords. Same basic problem occurs.
How can avoid downloading all of the web junk files (aspx, htm...)?
|jondank||10/01/2011 04:55 pm|
some of the site I am downloading aer SharePoint sites... I wonder if that is a proablem.
I seem to get all of the desired documents but I get all the web junk that I don't want.
|jondank||10/01/2011 05:11 pm|
|I am using the latest 6 beta Enterprise version.|
|Oleg Chernavin||10/02/2011 05:54 am|
|Please do not uncheck particular HTM, ASP or ASPX extensions. Uncheck the whole File Filters - Text category.
This will allow downloading all web pages, but they will be not stored on your disk. If you don't want to download web pages (ASPX, HTM fiels) then how Offline Explorer could find links to the files you need? It has to load them and follow links.
|jondank||10/02/2011 09:27 am|
Thanks. I will try that.
My primary purpose is to download the desired documents for indexing by an advanced search tool called dtSearch.
I do not need to browse offline as I will be using dtSearch to review my documents.
I selected Online Translation and Mark online links as nofollow... in case I find a need to link back to the original source.
Are you aware of any issues in downloading files stored in SharePoint sites?
|Oleg Chernavin||10/02/2011 09:45 am|
|It doesn't matter what translation you choose, because the web page files should not be stored, as I understand.
SharePoint sites are not easy for downloads. They have special kind of linking - so called doPostBack. Offline Explorer supports that, but this changes for new versions of SharePoint engine and I add improvements to handle such links often.
If some link is not followed, please let me know. It will be not easy to solve (because you work with an internal site), but I will do my best.