|jondank||10/04/2011 08:09 am|
During the latest download I watched as the queue slowly reduced and then after a considerable downtrend it turned back up again. I check the download folder and observed that no more files were actually being downloaded to my drive. Seems like it was stuck in a loop. I stopped it.
My project profile below.
Stream 1.2 File
LastStarted=10/4/2011 1:56:09 AM
LastEnded=10/4/2011 7:11:38 AM
|Oleg Chernavin||10/04/2011 08:11 am|
|Did you monitor the Queue tab to understand if there are strange URLs? Please also try to decrease number of attempts in the Internet tab of the Ribbon.
|jondank||10/04/2011 08:25 am|
I did not notice any unusually strange URLs but of course usually many of the URLs are strange. That is why I wanted to eliminate all the web junk DNA.
I will try your idea re attempts. I don't think it is an userid/password issue as I have ot set to prompt me. I could be havng an attempts issue in terms of contention at the url. I think I saw that happen once.
|Oleg Chernavin||10/04/2011 08:28 am|
|I understand. I don't have other ideas, because I even can't reproduce this without access to the site.
|jondank||10/04/2011 08:30 pm|
I reduced the number of retrues to 2 and I restarted the download of missing files. I noticed that while it is not actually copying the unwanted files (such as aspx) it is reporting that they are being downloaded. This seems inefficient.
|jondank||10/04/2011 11:41 pm|
Reducing the number of retries did not fix the problem. I watched it loop over allf the lowest level folders alphabetically. It processed A thrus Z and then started at A again. It keeps looping like this. I'm stopping it and moving to the next project.
Would like to help you debug this. I send you my project properties. Hopefully you can see that I have set it to load only from startining directory. I did this in each of Text, Images, Video, Audio, Archive, User Defined and Other/ I only have Archive and User Defined checked.
There is no depth limit set.
What does the Enable download directory check box do in the Parsing screen? I have run it with and without the box checked and it seems to have no effect.
I have Check files integrity, Explore all possible subdirectories and Supress Web site errors also checked in the Parsing screen.
|jondank||10/04/2011 11:59 pm|
I unchecked the integrity check and explore all subdirectories. Files in queue started to drop. I stopped current run and restarted.
|Oleg Chernavin||10/05/2011 02:17 am|
|I see now. It would be great to reproduce and improve that check.
Is it possible to give me access to the site and let me know several direct URLs that give that effect?
I could write you directly to the E-mail you use in this forum.
|jondank||10/05/2011 09:15 pm|
I would not be able to give you access. I access my client's site thry VPN.
You cna email me privately.
Not sure how I could technically let you see the screen.
What city are you in?
|Oleg Chernavin||10/06/2011 06:19 am|
|OK. I just sent you an E-mail. Thank you!
|mkirouac||12/28/2011 02:19 pm|
I'm having the same problems where indexing a SharePoint site gets caught in an infinite loop and never quits. If I set a level restriction, even a huge one it will finish but doesn't pull all the documents. A level of 5 should be more than enough to find all documents but even at 100 it doesn't. So I am trying to get out of this infinite look by adding entries to the filename filter to exclude the troublesome URLs but this isn't having any effect. URL filter doesn't see to work either.
If I have a URL like the above, I would expect at least one of the filters below to prevent it from downloading but the Test button always says "The URL will be downloaded".
Under URL Filter - filename, I have the below under exclude keywords.
All the documents I care about are under "Shared Documents" so I can safely exclude a number of other URLs but it doesn't seem to work the way I expect.
I have done everything listed above in this thread but it did not help.
|Oleg Chernavin||12/28/2011 04:18 pm|
|Can we have a remote support session, so I could take a look at the site (I assume, it is not available online)?
Please contact me via firstname.lastname@example.org and let's schedule the time.