  | |  | XHTML link tag stripping | XHTML link tag stripping 2006-11-29 - By Robert Houben
I happened to have something floating around that was close to what you asked for, so I modified it and include it here. It doesn't normalize the space:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>
<!-- Kill these in the output tree -->
<xsl:template match="a">
</xsl:template>
<xsl:template match="/">
<htmltext>
<xsl:apply-templates />
</htmltext>
</xsl:template>
<!--
For all other node types, just copy the node and it's content.
-->
<xsl:template match="*|processing-instruction()|comment()">
<xsl:choose>
<xsl:when test="not(node())"><xsl:apply-templates select="@*"/><xsl:text> </xsl:text></xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="@*|node()"/><xsl:text> </xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!--
For all other attributes, copy the attribute.
-->
<xsl:template match="@*">
<xsl:apply-templates /><xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
HTH,
________________________________
From: Peter Hollas [mailto:peterhollas@(protected)] Sent: Wednesday, November 29, 2006 3:50 AM To: xalan-j-users@(protected) Subject: XHTML link tag stripping
Hi everyone,
Please could someone provide an example stylesheet of how to strip <a> link tags out of a source XHTML document whilst retaining the remaining node text from within the body. Preferably the output should have normalised whitespace and a space seperating each extracted piece of text. eg.
Source:
<html> <head> <title>Not wanted</title> </head> <body> <a>Not wanted</a> <div class="1">This text is wanted <a href="#">Not wanted</a> and so is this</div> <p>Wanted</p> </body> </html>
Output:
<htmltext>This text is wanted and so is this Wanted</htmltext>
I'm sure that the solution is incredibly simple, but after days of trying I keep hitting a brick wall.
Many thanks, Peter.
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft -com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http: //www.w3.org/TR/REC-html40">
<head> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"> <meta name=Generator content="Microsoft Word 11 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @(protected) {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-reply; font-family:Arial; color:navy;} @(protected) Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> </head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'>I happened to have something floating around that was close to what you asked for, so I modified it and include it here. It doesn’t normalize the space:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <o:p></o:p>< /span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/> <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <!-- Kill these in the output tree --><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:template match="a"> <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:template ><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <o:p></o:p> </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:template match="/"><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <htmltext><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:apply-templates /><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </htmltext><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:template ><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <!--<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> For all other node types, just copy the node and it's content.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> --><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:template match="*|processing-instruction()|comment()"><o:p></o:p></span>< /font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:choose><o:p></o:p></span></font ></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:when test="not(node())"><xsl:apply-templates select="@*"/><xsl:text> </xsl:text></xsl:when><o :p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:otherwise><o :p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:apply-templates select="@*|node()"/><xsl:text> </xsl:text><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:otherwise><o :p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:choose><o:p></o:p></span>< /font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:template><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <!--<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> For all other attributes, copy the attribute.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> --><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:template match="@*"><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> <xsl:apply-templates /><xsl:text> </xsl:text><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'> </xsl:template><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'></xsl:stylesheet><o:p></o:p></span>< /font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'>HTH,<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size: 10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3 face="Times New Roman"><span style='font-size:12.0pt'>
<hr size=2 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt; font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2 face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'> Peter Hollas [mailto:peterhollas@(protected)] <br> <b><span style='font-weight:bold'>Sent:</span></b> Wednesday, November 29, 2006 3:50 AM<br> <b><span style='font-weight:bold'>To:</span></b> xalan-j-users@(protected) <br> <b><span style='font-weight:bold'>Subject:</span></b> XHTML link tag stripping< /span></font><o:p></o:p></p>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size: 12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size: 12.0pt'>Hi everyone,<br> <br> Please could someone provide an example stylesheet of how to strip <a> link tags out of a source XHTML document whilst retaining the remaining node text from within the body. Preferably the output should have normalised whitespace and a space seperating each extracted piece of text. eg. <br> <br> Source:<br> <br> <html><br> <head><br> <title>Not wanted</title><br> </head><br> <body><br> <a>Not wanted</a><br> <div class="1">This text is wanted <a href="#">Not wanted</a> and so is this</div> <br> <p>Wanted</p><br> </body><br> </html><br> <br> <br> Output:<br> <br> <htmltext>This text is wanted and so is this Wanted</htmltext><br> <br> I'm sure that the solution is incredibly simple, but after days of trying I keep hitting a brick wall. <br> <br> Many thanks, Peter.<o:p></o:p></span></font></p>
</div>
</body>
</html>
|
|
 |