UNOFFICIAL HTML-MODE PATCH FOR ISPELL 3.1.18/3.1.20 What does the patch do? This patch adds a html-mode to ispell. Basically this means that a patched copy of ispell will ignore any mark-up tags or html entities in a html document when spell checking that document. Any text inside an 'alt' attribute will however be checked. What is Ispell anyway? Ispell is a fast screen-oriented spelling checker that shows you your errors in the context of the original file, and suggests possible corrections when it can figure them out. Compared to UNIX spell, it is faster and much easier to use. Ispell can also handle languages other than English. [taken from the ispell README file] Where can I get the Ispell package? Ispell 3.1 kits are available for anonymous ftp from a number of sites. Check Archie for the string "ispell-3.1". The URL for the master archive site (with IP numbers following) is: ftp://ftp.cs.ucla.edu/pub/ispell-3.1/ispell-3.1.20.tar.gz (131.179.240.10) ftp://ftp.math.orst.edu/pub/ispell-3.1/ispell-3.1.20.tar.gz (128.193.80.161) The following European sites mirror ispell. If you can't find the latest version there, it probably just hasn't been mirrored yet: ftp://ftp.th-darmstadt.de/pub/dicts/ispell/ispell-3.1.20.tar.gz (130.83.55.75) ftp://ftp.nl.net:/pub/textproc/ispell/ispell-3.1.20.tar.gz (193.78.240.13) ftp://ftp.ibp.fr:/pub/ispell/ispell-3.1.20.tar.gz You can also locate ispell archive sites via the ispell home page: http://www.cs.ucla.edu/ficus-members/geoff/ispell.html How do I install the patch? You first need to get the source to ispell (the patch should work with versions 3.1.18 and also 3.1.2). Then untar and uncompress the ispell distribution. Cd to the ispell-3.1 directory which should have been created. Then run the following command to apply the patch patch < path_to_directory_containing_this_file/this_filename You can then install ispell as normal (see the README file included in the ispell distribution for details). How do I use it? The patched version of ispell should automatically enter html-mode whenever checking a file with a .htm or .html extension. You can also explicitly enter html-mode by using the -h command line option (see man page). If you want to spell-check a file with a .htm or a .html extension without treating it as a html file simply use either the -t or -n command line options. Examples: ispell index.html # html tags will be ignored ispell -h README # html tags will be ignored ispell -n index.html # html tags will be spell-checked What do I do if I find any bugs? If you find a bug and you feel that it is due to the html-mode-patch then please send an email to gtierney@nova.ucd.ie explaining what you think is wrong. Ispell bug reports in general should be sent to ispell-bugs@itcorp.com unless they are related to Emacs in which case the address to send reports to is ispell-el-bugs@itcorp.com. Is there any warranty? In a word: NO. As the developer of ispell, Geoff Kuenning, states: THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Please note however that Geoff Kuenning has no responsibility for this patch (except for writing much of the ispell code on which it is based) so if anything goes wrong, you are not even justified in bearing a grudge against him or any of the other contributors. Geoff Keunning should not be seen as endorsing this patch in any way, shape or form. Can I redistribute or change the code? As far as I am concerned you can do whatever you like with the code. However since the code is based around Geoff Kuenning's code, you are constrained by his redistribution and use restrictions. These can be found at the start of any of the source code files in the ispell distribution. ---------------C-U-T------H-E-R-E----------------- -- (actually there's no need to cut, patch will -- -- ignore the above text anyway. -- -------------------------------------------------- *** correct.c.orig Thu Oct 12 22:04:06 1995 --- correct.c Tue Dec 16 22:50:00 1997 *************** *** 233,238 **** --- 233,241 ---- int bufsize; int ch; + /* line added by Gerry Tierney */ + insidehtml = 0; + for (bufno = 0; bufno < contextsize; bufno++) contextbufs[bufno][0] = '\0'; *** defmt.c.orig Thu Oct 12 22:04:06 1995 --- defmt.c Tue Dec 16 22:50:03 1997 *************** *** 160,165 **** --- 160,166 ---- static int save_math_mode; static char save_LaTeX_Mode; + /* parameters changed by Gerry Tierney to include the output file */ static char * skiptoword (bufp) /* Skip to beginning of a word */ char * bufp; { *************** *** 170,175 **** --- 171,223 ---- || (tflag && (math_mode & 1))) ) { + /* Start of modifications by Gerry Tierney */ + /* We first check for an end-quote character if we are checking + inside of an alt attribute. If we find one we ignore the + rest of the tag */ + if (insidehtml == -1 && *bufp == '\"') + { + insidehtml = 1; + while (*bufp != '>' && *bufp != NULL) + bufp++; + if (*bufp == NULL) + insidehtml = 1; + } + + /* If we are checking a html file we want to ignore any + HTML tags. These should start with a '<' + and end with a '>' so we simply skip over anything + between these two symbols. If we reach the end of the line + before finding a matching '>' we set a flag 'insidehtml' */ + if (htmlflag == 1 && *bufp == '<') + { + /* Found start of html tag - Skip to end of tag or EOL */ + while (*bufp != '>' && *bufp != NULL && + strncasecmp(bufp,"alt=\"",5) != 0) + bufp++; + /* If we find an alt tag, we want to check its text */ + if (strncasecmp(bufp,"alt=\"",5) == 0) + { + insidehtml=-1; + bufp = bufp + 4; + } + else if (*bufp == NULL) + /* we've reached EOL without closing the tag */ + insidehtml = 1; + } + + /* Skip over quoted entities such as " + These all start with an ampersand and + end with a semi-colon. We do not need + to worry about them extending over more than one line */ + if (htmlflag == 1 && *bufp == '&') + { + while (*bufp != ';' && *bufp != NULL) + bufp++; + } + /* End of modifications by Gerry Tierney */ + + /* check paren necessity... */ if (tflag) /* TeX or LaTeX stuff */ { *************** *** 389,395 **** if (hadlf) contextbufs[0][len] = 0; ! if (!tflag) { /* skip over .if */ if (*currentchar == NRDOT --- 437,444 ---- if (hadlf) contextbufs[0][len] = 0; ! /* Conditions modified by Gerry Tierney to handle html-mode */ ! if (!tflag && htmlflag != 1) { /* skip over .if */ if (*currentchar == NRDOT *************** *** 426,432 **** /* if this is a formatter command, skip over it */ ! if (!tflag && *currentchar == NRDOT) { while (*currentchar && !myspace (chartoichar (*currentchar))) { --- 475,482 ---- /* if this is a formatter command, skip over it */ ! /* Conditions modified by Gerry Tierney to handle html-mode */ ! if (!tflag && htmlflag != 1 && *currentchar == NRDOT) { while (*currentchar && !myspace (chartoichar (*currentchar))) { *************** *** 441,447 **** --- 491,531 ---- return; } } + + /* Start of modifications by Gerry Tierney */ + + /* If we are checking a htmlfile and we have being left with + an open tag from a previous line, then we ignore everything + from the start of the line until we either reach the end of + the line or we close the tag */ + + if (htmlflag == 1 && insidehtml == 1) + { + while (*currentchar != '>' && *currentchar != NULL) + { + /* We check for an alt attribute (found inside img + tags). We want to spell check it's text so if + we find one, we switch out html-mode until we + find the next quote character. We signal this + state by setting the insidehtml flag to -1 */ + if (strncasecmp(currentchar,"alt=\"",5) == 0) + { + copyout(¤tchar,5); + insidehtml = -1; + break; + } + + (void) putc (*currentchar, ofile); + currentchar++; + } + if (*currentchar == '>') + /* We've closed the tag so we reset the flag */ + insidehtml = 0; + + } + /* End of modifications by Gerry Tierney */ + for ( ; ; ) { p = skiptoword (currentchar); *** ispell.1X.orig Mon Jan 23 21:28:25 1995 --- ispell.1X Tue Dec 16 22:50:02 1997 *************** *** 38,43 **** --- 38,46 ---- .\" SUCH DAMAGE. .\" .\" $Log: ispell.1X,v $ + .\" + .\" Documentation for html-mode added by Gerry Tierney 10/14/1995 + .\" .\" Revision 1.80 1995/01/08 23:23:31 geoff .\" Document the new personal-dictionary behavior (dictionary named after .\" the hash file is preferred). *************** *** 110,115 **** --- 113,119 ---- .IP \fIcommon-flags\fP: .RB [ \-t ] .RB [ \-n ] + .RB [ \-h ] .RB [ \-b ] .RB [ \-x ] .RB [ \-B ] *************** *** 296,301 **** --- 300,307 ---- The input file is in TeX or LaTeX format. .IP \fB\-n\fR The input file is in nroff/troff format. + .IP \fB\-h\fR + The input file is in html format. .IP \fB\-b\fR Create a backup file by appending ".bak" to the name of the input file. *************** *** 337,344 **** .RB ( \-n ) or TeX/LaTeX .RB ( \-t ) ! input mode. ! (The default is controlled by the DEFTEXFLAG installation option.) TeX/LaTeX mode is also automatically selected if an input file has the extension ".tex", unless overridden by the .B \-n --- 343,354 ---- .RB ( \-n ) or TeX/LaTeX .RB ( \-t ) ! input mode (This does not work for html ! .RB ( \-h ) ! mode. However html-mode is assumed for any files with a ".html" ! or ".htm" extension unless nroff/troff or TeX/LaTeX modes have ! being explicted defined). ! (The default mode is controlled by the DEFTEXFLAG installation option.) TeX/LaTeX mode is also automatically selected if an input file has the extension ".tex", unless overridden by the .B \-n *** ispell.c.orig Thu Oct 12 22:04:07 1995 --- ispell.c Tue Dec 16 22:50:02 1997 *************** *** 298,304 **** * ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 * ^^^^ ^^^ ^ ^^ ^^ * abcdefghijklmnopqrstuvwxyz ! * ^^^^^^ ^^^ ^ ^^ ^^^ */ arglen = strlen (*argv); switch ((*argv)[1]) --- 298,306 ---- * ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 * ^^^^ ^^^ ^ ^^ ^^ * abcdefghijklmnopqrstuvwxyz ! * ^^^^^^ ^ ^^^ ^ ^^ ^^^ ! * ! * -h flag used by Gerry Tierney for html-mode */ arglen = strlen (*argv); switch ((*argv)[1]) *************** *** 488,493 **** --- 490,496 ---- if (arglen > 2) usage (); tflag = 0; /* nroff/troff mode */ + htmlflag = -1; /* non-html mode */ deftflag = 0; if (preftype == NULL) preftype = "nroff"; *************** *** 496,505 **** --- 499,519 ---- if (arglen > 2) usage (); tflag = 1; + htmlflag = -1; /* non-html mode */ deftflag = 1; if (preftype == NULL) preftype = "tex"; break; + + /* -h option to enable HTML-mode added by Gerry Tierney */ + case 'h': + if (arglen > 2) + usage (); + tflag = 0; /* non-TeX mode */ + deftflag = 0; + htmlflag = 1; /* Html-Mode */ + break; + case 'T': /* Set preferred file type */ p = (*argv)+2; if (*p == '\0') *************** *** 810,816 **** if (tflag < 0) tflag = (cp = rindex (filename, '.')) != NULL && strcmp (cp, ".tex") == 0; ! if (prefstringchar < 0) { defdupchar = --- 824,830 ---- if (tflag < 0) tflag = (cp = rindex (filename, '.')) != NULL && strcmp (cp, ".tex") == 0; ! if (prefstringchar < 0) { defdupchar = *************** *** 818,823 **** --- 832,845 ---- if (defdupchar < 0) defdupchar = 0; } + + /* Modification by Gerry Tierney to set hmtl-mode + * based on file extension */ + if (htmlflag == 0) + htmlflag = + (cp = rindex (filename, '.')) != NULL && + ( strcmp (cp, ".html") == 0 || + strcmp (cp, ".htm") ); if ((infile = fopen (filename, "r")) == NULL) { *** ispell.h.orig Thu Oct 12 22:04:08 1995 --- ispell.h Tue Dec 16 22:50:00 1997 *************** *** 624,629 **** --- 624,641 ---- INIT (int tflag, DEFTEXFLAG); /* NZ for TeX mode in current file */ INIT (int prefstringchar, -1); /* Preferred string character type */ + /* The following two definitions added by + * Gerry Tierney + * 14th Oct 95 + */ + INIT (int htmlflag, 0); /* HTML-checking state. + * 1=enable html-mode, + * 0=enable html-mode based on filename, + * -1=disable html-mode */ + INIT (int insidehtml, 0); /* Flag to indicate that the current html + * tag has spanned more than one line */ + /* End of Gerry's Interference */ + INIT (int terse, 0); /* NZ for "terse" mode */ INIT (char tempfile[MAXPATHLEN], ""); /* Name of file we're spelling into */ ---------------C-U-T------H-E-R-E----------------- +---------------------------------+------------------------------------+ | Gerry Tierney, | You know there ain't no devil | | Computer Science Dept, | There's just God when he's drunk! | | University College Dublin. | ... Tom Waits | +---------------------------------+------------------------------------+