Creative Juices Bo. Co.

Satisfy Your Thirst For Something Refreshing!

Using RegEx to Remove Line-Breaks, White-Space From HTML (Except PRE Tags)

Remove White-Space from HTML Except from <pre>'s

If you have ever used ColdFusion's function (<cfprocessingdirective suppresswhitespace="yes">), you may have noticed that it strips white-space formatting from any HTML <pre>'s you may have in the code. 99% of the time you would never have this problem, but if you have a programming blog, this can be some what of a hassle.

Well, as I was re-writing my cjboco.com site, I decided to take up the challenge and create a function using RegEx that ignore's anything in-between the <pre>'s. It will also remove any comments, but since I was using this to strip all the white-space out of my generated HTML, I soon realized this wasn't working to well with my conditional comments for Internet Explorer. Oh well, at least it's in here for future use. Let me know if you have any problems.

<cffunction name="htmlRemoveWhiteSpace" returntype="string" output="no" hint="A simple function to remove white space from HTML (Except for <pre> tags)">
   <cfargument name="input" type="string" required="yes" />
   <cfargument name="remcoms" type="boolean" required="no" default="false" />
   <cfset var locvar = StructNew() />
   <cfset locvar.str = arguments.input />
   <cfif Len(locvar.str) gt 0>
      <cftry>
         <cfif FindNoCase("<pre>", locvar.str) gt 0>
            <cfset locvar.newstr = "" />
            <cfset locvar.pos = 1 />
            <cfset locvar.is_done = false />
            <cfloop condition="NOT locvar.is_done">
               <cfset subex = REFind('(?i)<pre[^>]*>(.+?)</pre>', locvar.str, locvar.pos, true)>
               <cfif subex.len[1] eq 0>
                  <cfset locvar.is_done = true />
               <cfelse>
                  <cfset locvar.html_str = ReReplace(Mid(locvar.str, locvar.pos, subex.pos[1] - locvar.pos), '[\r\n\t]+', '', 'ALL') />
                  <cfif arguments.remcoms>
                     <!--- replace all the comments --->
                     <cfset locvar.html_str = ReReplace(locvar.html_str, '<!--.*?-->', '', 'ALL') />
                     <cfset locvar.html_str = ReReplace(locvar.html_str, '/\*.*?\*/', '', 'ALL') />
                  </cfif>
                  <cfset locvar.pre = Mid(locvar.str, subex.pos[1], subex.len[1]) />
                  <cfset locvar.newstr = locvar.newstr & locvar.html_str & locvar.pre />
                  <cfset locvar.pos = subex.pos[1] + subex.len[1] />
               </cfif>
            </cfloop>
            <cfset locvar.newstr = locvar.newstr & ReReplace(Right(locvar.str, Len(locvar.str) - locvar.pos + 1),"[\r\n\t]+","","ALL") />
            <cfset locvar.str = locvar.newstr />
         <cfelse>
            <cfset locvar.str = ReReplace(locvar.str,"[\r\n\t]+","","ALL") />
            <cfif arguments.remcoms>
               <!--- replace all the comments --->
               <cfset locvar.str = ReReplace(locvar.str, '<!--.*?-->', '', 'ALL') />
               <cfset locvar.str = ReReplace(locvar.str, '/\*.*?\*/', '', 'ALL') />
            </cfif>
         </cfif>
         <cfcatch type="any">
            <cfset locvar.str = cfcatch.message />
         </cfcatch>
      </cftry>
   </cfif>
   <cfreturn locvar.str />
</cffunction>