java script trouble

fuma
09/29/2017 02:53 am
I am unable to download some images from the following address http://imgsrc.ru/believinga/a831292.html

They use a script to get the url and offline explorer does not get the image. *It will if you select the download method to open every page under "project->address->download method" however this is real slow and I don't think that method is following my file filter rules as it will attempt to download every link it comes across.

The script they use is pasted below via vie source from inside the oe browser.

<a id='next_url' href='/believinga/28446492.html#bp'><img class='prv big' id='bip' alt='P6030121.JPG'><script>var s='HjT';</script><br></a>

<script type='text/javascript'>

var p=document.getElementById('prc').src,
z=p.lastIndexOf('/')+1,
a=p.indexOf('.jpg')-3,
b=p.slice(8,z)+'imgsrc.ru_',
i=p.slice(z,a)+p.charAt(48)+s.charAt(0)+s.charAt(2);
document.getElementById('bip').src='http://b'+b+i+'.jpg';
</script>

Oleg Chernavin
09/29/2017 02:55 am
Yes, it is impossible to calculate links from this script. The only way is the new download mode.

It is slower, but still it would work. Regarding filters - can you please describe me what is wrong when downloading in this mode?

Thank you!

Best regards,
Oleg Chernavin
MP Staff
fuma
09/30/2017 05:11 am
So I noticed it did several things I was not expecting.

1: In my project properties under address I was using SkipParsingAfter= it seems to load the whole page and grab everything. (this might not be an error)

2: Under File modification. I have it set to do not download existing files. It however re downloads all files.

3: I set up proxy servers via the settings inside oe but I think the internal browser is not using them.
If I just browse to https://www.whatismyip.com/ it gives me my real ip and not the proxy's ip I have set up.

Is it possible to tell oe to use more internal browser windows to speed up this mode?
Oleg Chernavin
09/30/2017 05:14 am
1. Yes, it downloads the whole page, but then cuts off the part after the specified words or contents. So, only a part of it would be save dto disk.

2. It is possible that the server doesn't support file modification checks. Can you please tell me which site you arer trying to download? I will see what happens.

3. Unfortunately, the Internal Browser is an embedded MS Internet Explorer and it uses its proxy server settings.

Yes, we plan to allow more tabs/windows in the new download mode soon.

Oleg.
fuma
10/08/2017 06:42 pm
3. Unfortunately, the Internal Browser is an embedded MS Internet Explorer and it uses its proxy server settings.

Is this something you plan to change? My current proxy list has several entries to avoid download blocks on some web sites. Plus I don't want my normal internet traffic to go through a proxy.


So I was able to get it to ignore the filters I set up. I have oe run at the start up in case of restarts.
If I have a sequence of sites to grab oe ignores the filter settings if it has something new to download.
It starts grabbing every link on the page and the referer is blank. If I hit stop and the sequencer loads it again the same thing happens.
If I stop the sequencer and run the project via right click -> start it runs normal If I start the sequencer after this it runs normal.

The following project is in a sequence that repeats several projects.

address:

http://imgsrc.ru/cat/18-puteshestviya.html
Channels=10
ChannelsPerServer=3
Delay=0
SkipParsingAfter=Total albums found:
SkipParsingAfter=Do you like
SkipParsingAfter={:2..10} days ago
SkipParsingAfter=1 day ago

level limit -> 4

file modification -> do not download existing files
(i have a scrip that runs in the background that deletes the main html file)

file filters
text -> download using url filter settings
images -> download from any site
other -> download using url filter settings

url exclusions
tape.php
*upic*.jpg
http*//*static*
http*://s*
http*altpubli
http*counter.rambler.ru
http*squid-cache
http*yandex.ru
http://*../

servers included keywords
*imgsrc*
b*.r
o*.ru

directory excluded keywords
/cat/
/members/
/main/
{:file=e:/imgsrsdnduser.txt} (manually entered username list to ignore)

excluded filename keywords
page
dlp.imgsrc.ru
{:file=e:\imgsrcdnd.txt} (currently I run a script that gives me all the html filenames in all the directorys and add it to this file so oe does not try to download anything it already has) I assumed thats what do not download existing files would do but if I don't do this I get thousands of links to redownload.

filename included keywords
a1*
.html*
&pwd=

limits
stop downloading if total number of files ... 500
stop downloading if time exceeds ... 6 min

parsing -> url subsitutes urls/replace/with
http://imgsrc.ru/*/imgsrc.ru_ http://imgsrc.ru/*/ (blank)
* http_3a http:
http://b/*.us.icdn.ru b/*.us b*.us
http://o/*.us.icdn.ru o/*.us o*.us
http://*/imgsrc.ru_[a-zA-Z]+.jpg * (blank)

link conversion -> on-line