http://www.metaproducts.com/mp/mpSupport_User_Forums_Topic.asp?topic=7
What products would I need to crawl this page for both the image and meta data?
Thanks much!
Can you please explain it in details?
Thank you!
Best regards,
Oleg Chernavin
MP Staff
I mean the fielded text on the pages, fielded into a spreadsheet, CSV or some other format.
For example, on the following page, I would like the image AND the fielded text:
<a href="http://www.civilwar.nps.gov/cwss/petersburgd.cfm?id=1845">target page</a>
William K. Smith (First_Last)
Company: K
Unit Number: 106th
Rank:
State/Federal: Pennsylvania
Military Organization: Infantry
Date of Death: June 22, 1864
Original Burial Place: Fort Hell
Gravestone Number: 2975
Comments: Killed
Where the fields are:
Name:
Company:
Unit Number:
Rank:
State/Federal:
Military Organization:
Date of Death:
Original Burial Place:
Gravestone Number:
Comments:
OEP can download the page exactly as you see it online, i.e.:
Level: 0
"Text" and "Images":
Location: Load only from the starting server
If you want to extract some parts of the text, you could use a tool like Textpipe...
Oleg.