Crawling for images and meta data
|G. Jung||10/04/2004 12:45 pm|
|I own Offline Explorer Pro, but I would like to crawl PUBLIC DOMAIN pages like the following for both the images and meta data:
What products would I need to crawl this page for both the image and meta data?
|Oleg Chernavin||10/04/2004 01:51 pm|
|Offline Explorer will easily download pages with images. But what do you mean under "meta data"?
Can you please explain it in details?
|Len Lydik||10/04/2004 02:11 pm|
I mean the fielded text on the pages, fielded into a spreadsheet, CSV or some other format.
For example, on the following page, I would like the image AND the fielded text:
<a href="http://www.civilwar.nps.gov/cwss/petersburgd.cfm?id=1845">target page</a>
William K. Smith (First_Last)
Unit Number: 106th
Military Organization: Infantry
Date of Death: June 22, 1864
Original Burial Place: Fort Hell
Gravestone Number: 2975
Where the fields are:
Date of Death:
Original Burial Place:
|10/04/2004 04:58 pm|
|Hmm, I also still don`t know what`s your problem?
OEP can download the page exactly as you see it online, i.e.:
"Text" and "Images":
Location: Load only from the starting server
If you want to extract some parts of the text, you could use a tool like Textpipe...
|Oleg Chernavin||10/05/2004 12:45 am|
|Yes, you will need to setup TextPipe Pro to extract the data after you download the pages. Please go to the Tools | Data Mining menu to get more details.