During the latest download I watched as the queue slowly reduced and then after a considerable downtrend it turned back up again. I check the download folder and observed that no more files were actually being downloaded to my drive. Seems like it was stuck in a loop. I stopped it.
My project profile below.
Stream 1.2 File
LastStarted=10/4/2011 1:56:09 AM
LastEnded=10/4/2011 7:11:38 AM
I did not notice any unusually strange URLs but of course usually many of the URLs are strange. That is why I wanted to eliminate all the web junk DNA.
I will try your idea re attempts. I don't think it is an userid/password issue as I have ot set to prompt me. I could be havng an attempts issue in terms of contention at the url. I think I saw that happen once.
I reduced the number of retrues to 2 and I restarted the download of missing files. I noticed that while it is not actually copying the unwanted files (such as aspx) it is reporting that they are being downloaded. This seems inefficient.
Reducing the number of retries did not fix the problem. I watched it loop over allf the lowest level folders alphabetically. It processed A thrus Z and then started at A again. It keeps looping like this. I'm stopping it and moving to the next project.
Would like to help you debug this. I send you my project properties. Hopefully you can see that I have set it to load only from startining directory. I did this in each of Text, Images, Video, Audio, Archive, User Defined and Other/ I only have Archive and User Defined checked.
There is no depth limit set.
What does the Enable download directory check box do in the Parsing screen? I have run it with and without the box checked and it seems to have no effect.
I have Check files integrity, Explore all possible subdirectories and Supress Web site errors also checked in the Parsing screen.
I unchecked the integrity check and explore all subdirectories. Files in queue started to drop. I stopped current run and restarted.
Is it possible to give me access to the site and let me know several direct URLs that give that effect?
I could write you directly to the E-mail you use in this forum.
I would not be able to give you access. I access my client's site thry VPN.
You cna email me privately.
Not sure how I could technically let you see the screen.
What city are you in?
I'm having the same problems where indexing a SharePoint site gets caught in an infinite loop and never quits. If I set a level restriction, even a huge one it will finish but doesn't pull all the documents. A level of 5 should be more than enough to find all documents but even at 100 it doesn't. So I am trying to get out of this infinite look by adding entries to the filename filter to exclude the troublesome URLs but this isn't having any effect. URL filter doesn't see to work either.
If I have a URL like the above, I would expect at least one of the filters below to prevent it from downloading but the Test button always says "The URL will be downloaded".
Under URL Filter - filename, I have the below under exclude keywords.
All the documents I care about are under "Shared Documents" so I can safely exclude a number of other URLs but it doesn't seem to work the way I expect.
I have done everything listed above in this thread but it did not help.
Please contact me via email@example.com and let's schedule the time.