File Filters and Duplicates in OE 6

Author Message
Sean 11/09/2011 09:24 am
Hi,
I am trying to download specific content (jpg's and mov's) from the members area of amateurallure.com. I have configured the project as best I can but I am receiving unwanted files and duplicates.

I am trying to only download *.mov and *.jpg files but NOT certain .mov and .jpg files (via excluded entires).
Example: I want video01.mov and video01.jpg but NOT videosm01.mov or video_th.jpg or any other video or image type.

I am also getting 2 of every file downloaded, which wastes bandwidth and space.

Lastly, files that aren't video or image are being downloaded - I don't know what these are but they are the same size as video files and have the video file name in them. I have tried to exclude them but they are being downloaded anyway.
Example: This is one of many files that have been downloaded that are neither .mov or .jpg - http://www.amateurallure.com/members/_girls/maelynn/_video/maelynn02.mov?PSSO=ei9Vb2RraGgramVtUUlxTEpjc1Q4aDRiVXlITDI3d3E1Mm84S0tJeUpsTG5iK2E5cTNpTmJiY1ZmODVNVVVWZQpyTUN5dS94VlQ3MzM2SXlra2MwZzZjQzlLdWYvZVlmdAo*

My project settings are... (I am providing only settings that are selected and/or not blank)
Project > Addresses: http://www.amateurallure.com/members/
Level Limit: unchecked
Do Not download existing files: selected
File Filters > Images (jpeg and jpg) and Videos (mov) checked
Load Using URL filter settings selected for both
URL Filters > Directory: Load files only within the starting directly and below selected
FileName: Load files only with the starting filename is NOT selected
Excluded Keywords: *_th.jpg *sm0*.mov *.mov@psso* *.mov?psso*
Content Filters > When keywords are found in a page - save these pages is checked
All settings beneath advanced are the defaults - only thing I have added is the download location and userid password.

The directory structure off of the members folder is...
members/_bonus
members/_classics
members/_friends
members/_girls
members/_girlspop
members/girls
members/images
I think some of these have the same content beneath them but in different folder names and locations hence the duplicates).

Any help you could offer would be greatly appreciated. I realize there is a lot of information here but I figure the more you know the better you'll be equipped to assist.

Thank You.
Oleg Chernavin 11/09/2011 09:29 am
Everything looks correct from your description. Can you post exact settings here? Select the Project, press Ctrl+C and paste to the forum message.

Thank you!

Best regards,
Oleg Chernavin
MP Staff
Oleg Chernavin 11/09/2011 09:30 am
Regarding duplicates - please try to uncheck the "Check files integrity" box in the Properties - Parsing section. Would this help?

Oleg.
Sean 11/10/2011 06:09 pm
Below find project settings gathered via Project > Properties > CTRL+C

[Object]
OEVersion=Enterprise 6.0.0.3658
Type=0
IID=1
Caption=Amateur Allure
URL=http://www.amateurallure.com/members/
Dir=D:\Downloads\amateurallure\
MVer=5
Lev=1000001
Weekday=257
User=******
Psw=******
LimTSize=10000
LimNumber=5000
LimTime=100
FMGroup=2
SkipMedia=True
FTText.Exts=htmlhtmaspaspxjspstmstmlidcshtmlhtxtxttextxspxmlrxmlcfmwmlphpphp3
FTImages.Exts=bmpfifgifipxj2cj2kjp2jpegjpglwfpngtiftiffwbmpwebpxbm oooooooxxooooooo
FTVideo.Exts=aniasfasxaviflcfliflvm1vm2vm4vmovmp4mpegmpgramrmrvsmilvivvobwmv ooooooooooxoooooooooo
FTAudio.Exts=wavriffmp3midmp2m3uravocwmaapeoggm4aaif
FTArchive.Exts=7zziparcgzzarjlhalayleirarcabtarpakacejarpdftgzexeiso
FTUDef.Exts=jsaxdcssssivbsdtdxslswfclassent
FTText.B=xoxooo
FTImages.B=ooxooo
FTVideo.B=ooxooo
FTAudio.B=xoxooo
FTArchive.B=xoxooo
FTUDef.B=xoxooo
FTOther.B=xoxooo
FTSizes=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0
NotIgnoreLogout=False
RPathBx=1
RFileEx=*_th.jpg*sm0*.mov*.wmv*.mov?psso**.mov@psso* xxxxx
RProt=255
LastStart=217:163:34:113:233:242:227:64:
LastEnd=30:82:209:46:234:242:227:64:
LastStarted=11/8/2011 7:04:53 AM
LastEnded=11/8/2011 7:38:13 AM
S200=3145
S304=637
SAbr=21207
SPar=3133
SSav=25
SLast=302
SSiz=2205185065
SMdf=25
SHTML=3120
SSuccDowns=12
LFiles=3800
LSize=2970427504
Stopped=True
Flags=1
CFFlags=64
ImgDim=0,0,0,0
PrevURL=http://www.amateurallure.com/members/
ExploreDirs=True
ConvertRSS=True
LIndexed=False
IndexFiles=False
MapStats=701,8160438311,0,0,665,24391578,0,0,36,8136046733,0,0,0,0
Oleg Chernavin 11/10/2011 06:21 pm
The settings are correct. There could be a possible bug with the unchecked File Filters sections. Can you please give me access to the site via support@metaproducts.com ? I will make the download with your settings and see if this reproduces.

Thank you!

Oleg.
Oleg Chernavin 11/11/2011 04:03 pm
Yes, I got the E-mail and trying to reproduce this.

Oleg.
Oleg Chernavin 11/11/2011 04:22 pm
I made download (not complete). And enabled logging to see the rejected URLs.

The log clearly shows that the _tn.jpg files and the video files with .mov?psso=.... are not allowed for the download by the parser. It works correctly.

Regarding the duplicates - did you try to uncheck the "Check files integrity" box? Does it help? If not, please describe me what kind of duplicates do you have and what are their URLs. Also, on which pages I can find these links.

Oleg.