Grabbing text content only
|Mark||07/07/2009 03:53 am|
What would be the easiest way to save web-pages as text files or plain HTML files with readable text only (I mean no images, graphics, forms, styles, etc. attached)?
I have a bunch of forum/article URLs which are like:
and am highly interested in getting text-only copies of the presented materials and comments on my computer for my upcoming socials project. http://lists.swisslinux.org/pipermail/swisslinux-annonces/2007-December.txt, I believe, could serve as the perfect example of what I actually need as a final result. Preserving online/offline links (i.e. clickable text) is not necessarily a priority. Plain de-tagged human-readable text would be quite enough.
Thanks in advance.
|Oleg Chernavin||07/07/2009 01:07 pm|
|I think, it is possible with the following approach - you can install TextPipe Pro from www.datamystic.com and use the Tools - Data Mining in Offline Explorer Enterprise. TextPipe has many filters, one of them cleans HTML files from tags to get only the pure text.