www.rottentomatoes.com

Wilbur
03/20/2004 01:39 am
I am currently working on a linguistics project involving movie reviews. I would like to download two different sets of reviews from rottentomatoes for each movie which the site lists as either fresh or rotten. I have tried to limit to these criteria but unfortunately I keep either getting too many pages or too few.
Here is two links to the webpages which lists the links to the reviews.
http://www.rottentomatoes.com/m/ThePassionoftheChrist-1129941/reviews.php?critic=fresh&sortby=default&page=1
http://www.rottentomatoes.com/m/ThePassionoftheChrist-1129941/reviews.php?critic=rotten&sortby=default&page=1

There are four different pages of links to reviews which differ only with the last number and each page is link to the other.
I have tried numerous times with different settings including limiting keywords, servers, directories.
I think the biggest problem is because the links connect via php to sites outside of the www.rottentomatoes.com server and import the reviews into a frame.
Instead of just getting the reviews, I either do not get enough pages or I get the entire site.

Here are examples of the links to one review:
http://www.rottentomatoes.com/click/movie-1129941/reviews.php?critic=rotten&sortby=default&page=1&rid=1253646
http://www.msnbc.msn.com/id/4338528/
and here is the frame source for the review
<frame src="reviews_viewer.php?object=movie&id=1129941&critic=rotten&sortby=default&page=1&rid=1253646" name=viewer scrolling=auto noresize marginwidth=0 marginheight=0>

I am going to keep banging my head against the computer till it works. I would appreciate any help.

Thanks,

Wilbur



Oleg Chernavin
03/22/2004 11:38 am
Well, the site is tricky, but there is a way to load not much of an extra stuff.

I selected loading from all servers and directories in URL Filters. URL Filters | Filename should have the following Custom Configuration:

Included keywords list:

reviews.php?critic=fresh
reviews_viewer.php
reviews_control.php
http://#www.rottentomatoes.com/*/*

The first allows loading the review pages on the starting server. All other files from www.rottentomatoes.com site will not be loaded.

The last keyword allows loading all files from all directories from any site which is not the www.rottentomatoes.com.

You will get just few extra links this way. You can easily abort them using the Queue tab - just start the Project download, then press F9 key to suspend, switch to the Queue tab and wait until the first file gets loaded. You will get a list of links that Offline Explorer is going to download. Abort the unwanted ones and press F9 again to resume the download.

Best regards,
Oleg Chernavin
MP Staff
Wilbur
03/28/2004 05:46 am
Thanks a lot for the help Oleg. Your information allowed me to quickly gather the materials for my research project. I also commend the effort behind the products and customer support. This level of commitment from a company is a pleasure to witness.

Thanks,

Wilbur
Oleg Chernavin
03/28/2004 01:10 pm
Thank you for your kind words! I am really glad to help you!

Oleg.