I wanto to extract all links which have the format:
http://www.megaupload.com/*
from the site: http://hd-bb.org/viewforum.php?f=60
username: gorgonzola
password: qwerty
These are my steps:
I started a new project named:
http://hd-bb.org/viewforum.php?f=60
Then I added the limit 1 and checked ONLY "Tex" as File Filter
In "Content Filter" I added: http://www.megaupload.com and left everything else default there
In Advanced/Passwords, I added the username/password above.
However, Offline Explorer parses the threads, but it downloads 0 files.
How I can fix this and get what I want?
Jan
Remove the keyword from Contents Filter. Use URL Filters - Server and add two keywords to the Included list:
www.megaupload.com
hd-bb.org
I think, this should work.
Best regards,
Oleg Chernavin
MP Staff
>
> Remove the keyword from Contents Filter. Use URL Filters - Server and add two keywords to the Included list:
>
> www.megaupload.com
> hd-bb.org
>
> I think, this should work.
>
> Best regards,
> Oleg Chernavin
> MP Staff
Hi!
This is what I did:
I logged in with the internal browser.
Then created a new project with the wizard and checked under File Filters only "Text".
Then under "URL Filters" I added what you told me in the Server Tab.
Then I started the project, but I noticed that the queue gets bigger and bigger.
Therefore I added at URL Omissions, the following:
http://hd-bb.org/memberlist.php?*
http://hd-bb.org/posting.php?*
http://hd-bb.org/report.php?*
http://hd-bb.org/search.php?*
http://hd-bb.org/ucp.php?*
The idea is, that only the topics are parsed. But even so, it takes alot of time.
So, I would ask the following:
How can I extract the links of the format http://www.megaupload.com/* from the first post of each thread? In a timely fashion of course. How would you do it? I am just interested in a list containing just the links.
Thank you for your time and great support!
Jan
Oleg.
>
> Oleg.
So, while being logged in with the internal browser and after downloading all topics on the server, will I be able to do a search for "http://www.megaupload.com/*", so that it outputs a list with links?
Sorry if I will double post, but the last message, wasn''t posted by the forum.
Jan
I think, TextPipe Pro will work for this task - please use the Tools - DataMining button in Offline Explorer Pro. This software is not easy to understand, but quite powerfull.
I think, its trial mode will still allow you to make the extraction.
Oleg.
>
> I think, TextPipe Pro will work for this task - please use the Tools - DataMining button in Offline Explorer Pro. This software is not easy to understand, but quite powerfull.
>
> I think, its trial mode will still allow you to make the extraction.
>
> Oleg.
Hi Oleg!
There is one problem:
There were 57000 files extracted. But none contains the data which you get after logging in to the forum. When I open a file with a browser, I am requested to log myself in. In the files on my hdd, there are no links at all. How can I download the posts which I see after I log in to the forum?
Oleg.
>
> Oleg.
Hi!
I am still having problems. Do you know what works?
If I go on a thread in the internal browser, like:
http://hd-bb.org/viewtopic.php?f=14&t=15024
and log in and then choose by right click, the menu "Offline Explorer: Download the current page". Then everything works as it should.
Now, how do I do it, so that I can get the same result with ALL threads in the section:
http://hd-bb.org/viewforum.php?f=14
If I am doing the following, I don''t get any data, only files which require me to log in to read the post:
1) I am going to http://hd-bb.org/viewforum.php?f=14 and log in with gorgonzola/qwerty
2) I click "New Project", add http://hd-bb.org/viewforum.php?f=14 as URL
3) I choose only text as File Filters
4) I keep everything else default and start the program
As a result, I get for instance files of the form viewtopic.php@f=14&t=*
But when I open these files in a browser, I am requested to log in to the forum. I am not getting the topic itself, but a page which requires me to log in. This happens with every topic!
What do I need to fix, in order to get the same result I would get if I would save an individual topic with Offline Explorer?
I am sorry if am annoying you Oleg, but I think that Offline Explorer can really do what I want.
Jan
Oleg.
>
> Oleg.
I am using the latest version. Could you try a test run? I am pretty sure, you will get the same results as me.
http://hd-bb.org/viewforum.php?f=60
Level=1
Unchecked All File Filters categories, except Text, as in your setup. URL Filters - Server, Directory - "Load only from the starting...", Filename - added the following to the Included list:
viewtopic
File Filters - Text - Ignore Logout Links box is checked. This worked OK for me. I downloaded 209 pages, all are in the logged on state.
In the Ribbon - Internet tab it is important to have checked the following:
Use MS Internet Explorer cookies
Use alternative connection method
Oleg.