Grabbing text content only

Author Message
Mark 07/07/2009 03:53 am
Hello, Oleg.

What would be the easiest way to save web-pages as text files or plain HTML files with readable text only (I mean no images, graphics, forms, styles, etc. attached)?
I have a bunch of forum/article URLs which are like:

http://skus1.free.fr/spip.php?article324
http://www.vibrisse.net/spip.php?article7
http://www.david-tate.fr/spip.php?article984
http://monsitespip.com/spip.php?article28

and am highly interested in getting text-only copies of the presented materials and comments on my computer for my upcoming socials project. http://lists.swisslinux.org/pipermail/swisslinux-annonces/2007-December.txt, I believe, could serve as the perfect example of what I actually need as a final result. Preserving online/offline links (i.e. clickable text) is not necessarily a priority. Plain de-tagged human-readable text would be quite enough.

Thanks in advance.
Kind regards
Mark
Oleg Chernavin 07/07/2009 01:07 pm
I think, it is possible with the following approach - you can install TextPipe Pro from www.datamystic.com and use the Tools - Data Mining in Offline Explorer Enterprise. TextPipe has many filters, one of them cleans HTML files from tags to get only the pure text.

Best regards,
Oleg Chernavin
MP Staff