Crawling for images and meta data

Author Message
G. Jung 10/04/2004 12:45 pm
I own Offline Explorer Pro, but I would like to crawl PUBLIC DOMAIN pages like the following for both the images and meta data:

http://www.metaproducts.com/mp/mpSupport_User_Forums_Topic.asp?topic=7

What products would I need to crawl this page for both the image and meta data?

Thanks much!
Oleg Chernavin 10/04/2004 01:51 pm
Offline Explorer will easily download pages with images. But what do you mean under "meta data"?

Can you please explain it in details?

Thank you!

Best regards,
Oleg Chernavin
MP Staff
Len Lydik 10/04/2004 02:11 pm

I mean the fielded text on the pages, fielded into a spreadsheet, CSV or some other format.

For example, on the following page, I would like the image AND the fielded text:

<a href="http://www.civilwar.nps.gov/cwss/petersburgd.cfm?id=1845">target page</a>

William K. Smith (First_Last)
Company: K
Unit Number: 106th
Rank:
State/Federal: Pennsylvania
Military Organization: Infantry
Date of Death: June 22, 1864
Original Burial Place: Fort Hell
Gravestone Number: 2975
Comments: Killed


Where the fields are:

Name:
Company:
Unit Number:
Rank:
State/Federal:
Military Organization:
Date of Death:
Original Burial Place:
Gravestone Number:
Comments:
10/04/2004 04:58 pm
Hmm, I also still don`t know what`s your problem?

OEP can download the page exactly as you see it online, i.e.:

Level: 0
"Text" and "Images":
Location: Load only from the starting server

If you want to extract some parts of the text, you could use a tool like Textpipe...
Oleg Chernavin 10/05/2004 12:45 am
Yes, you will need to setup TextPipe Pro to extract the data after you download the pages. Please go to the Tools | Data Mining menu to get more details.

Oleg.