I wanto to extract all links which have the format:
from the site: http://hd-bb.org/viewforum.php?f=60
These are my steps:
I started a new project named:
Then I added the limit 1 and checked ONLY "Tex" as File Filter
In "Content Filter" I added: http://www.megaupload.com and left everything else default there
In Advanced/Passwords, I added the username/password above.
However, Offline Explorer parses the threads, but it downloads 0 files.
How I can fix this and get what I want?
Remove the keyword from Contents Filter. Use URL Filters - Server and add two keywords to the Included list:
I think, this should work.
> Remove the keyword from Contents Filter. Use URL Filters - Server and add two keywords to the Included list:
> I think, this should work.
> Best regards,
> Oleg Chernavin
> MP Staff
This is what I did:
I logged in with the internal browser.
Then created a new project with the wizard and checked under File Filters only "Text".
Then under "URL Filters" I added what you told me in the Server Tab.
Then I started the project, but I noticed that the queue gets bigger and bigger.
Therefore I added at URL Omissions, the following:
The idea is, that only the topics are parsed. But even so, it takes alot of time.
So, I would ask the following:
How can I extract the links of the format http://www.megaupload.com/* from the first post of each thread? In a timely fashion of course. How would you do it? I am just interested in a list containing just the links.
Thank you for your time and great support!
So, while being logged in with the internal browser and after downloading all topics on the server, will I be able to do a search for "http://www.megaupload.com/*", so that it outputs a list with links?
Sorry if I will double post, but the last message, wasn''t posted by the forum.
I think, TextPipe Pro will work for this task - please use the Tools - DataMining button in Offline Explorer Pro. This software is not easy to understand, but quite powerfull.
I think, its trial mode will still allow you to make the extraction.
> I think, TextPipe Pro will work for this task - please use the Tools - DataMining button in Offline Explorer Pro. This software is not easy to understand, but quite powerfull.
> I think, its trial mode will still allow you to make the extraction.
There is one problem:
There were 57000 files extracted. But none contains the data which you get after logging in to the forum. When I open a file with a browser, I am requested to log myself in. In the files on my hdd, there are no links at all. How can I download the posts which I see after I log in to the forum?
I am still having problems. Do you know what works?
If I go on a thread in the internal browser, like:
and log in and then choose by right click, the menu "Offline Explorer: Download the current page". Then everything works as it should.
Now, how do I do it, so that I can get the same result with ALL threads in the section:
If I am doing the following, I don''t get any data, only files which require me to log in to read the post:
1) I am going to http://hd-bb.org/viewforum.php?f=14 and log in with gorgonzola/qwerty
2) I click "New Project", add http://hd-bb.org/viewforum.php?f=14 as URL
3) I choose only text as File Filters
4) I keep everything else default and start the program
As a result, I get for instance files of the form viewtopic.php@f=14&t=*
But when I open these files in a browser, I am requested to log in to the forum. I am not getting the topic itself, but a page which requires me to log in. This happens with every topic!
What do I need to fix, in order to get the same result I would get if I would save an individual topic with Offline Explorer?
I am sorry if am annoying you Oleg, but I think that Offline Explorer can really do what I want.
I am using the latest version. Could you try a test run? I am pretty sure, you will get the same results as me.
Unchecked All File Filters categories, except Text, as in your setup. URL Filters - Server, Directory - "Load only from the starting...", Filename - added the following to the Included list:
File Filters - Text - Ignore Logout Links box is checked. This worked OK for me. I downloaded 209 pages, all are in the logged on state.
In the Ribbon - Internet tab it is important to have checked the following:
Use MS Internet Explorer cookies
Use alternative connection method