HTML Syntax Checker


What is an HTML Syntax Checker?
The HTML Syntax checker is a program that will attempt to help you with problems in making your webpage, or at least in making them work correctly. It does this by looking through your code to see how the tags look. If you haven't already attempted to make your webpage, or at least read the relevant QuickReads (from the HTML Tutorial) and the Assignments, then most of this won't make sense yet.
First off, this program will NOT make the webpage for you as it doesn't actually change any of your code. What it does do is check for some common errors and report them to you. It will also attempt to tell you what line number you should be looking at. Again, it does NOT actually fix any problems, just alert you to them.

 
 
So what exactly does the checker tell me?
It will (or at least try to):
  • Print a list of the tags you used, how many times you used them, and on which lines you used them
  • Check that you have exactly 1 each of the "main" tags that should always be in a webpage (html, body, and such) and that they are in proper order
  • Checks that you have equal numbers of open and closing tags for appropriate tags (for example: that you have the same amount of <b> tags as </b> tags)
  • Print a list of tags it doesn't understand (probably typos, like saying <titel> instead of <title>)
  • Print a list of any tags you started (opened) but didn't close (for example haveing a <b> without a </b> tag)
  • Print a list of any tags you closed but hadn't yet opened (having the </b> before the <b>)
  • Check that the number of >'s is equal to the number of <'s in your page
  • Print a list of lines whose tags contained an odd number of " or ' marks (which is likely an error)
  • Check that "named" thing (tags with a name=something attribute in them) are only named once (this really only matters for javascript and forms)
  • Check that the attributes listed for a tag are valid (for example, the font tag can have an attribute of color, but not an attribute of border)
  • Check that values for attributes are reasonable (for example, font size=1 is valid but size=x is not valid)
For all of these it will try to print relevant line numbers. If you are missing a closing tag it will try to give the most likely lines that the opening tag was on. Just because there is an error doesn't mean the page won't work at all (though it might not), in fact it could even look correct in Netscape but not actually be correct HTML code. This is because Netscape (and IE) will try to do what they think you meant to do, and not simply what you said to do. The checker does a reasonable job with comments except for when they occur within a tag (which isn't correct anyway). It does NOT check that your javascript code is valid as it is only concerned with HTML tags. If your Javascript code contains a "<" or ">" in it (eg for comparing numbers) the checker might think that the < is the beginning of an html tag. This may be fixed in the future. The checker usually does correctly identify missing " problems in Javascript.

 
 
Sounds great, so how do I use it?
After creating a webpage (index.html) in your grove account (c3063xxx) you can then use this program in one of two ways. Both ways produce the same output. The easiest way is probably to use the checker through a web page at http://grove.ufl.edu/cgi-bin/cgiwrap/u3063bnk/htmlcheck.cgi. Enter your username (grove account) and the name of the file you wish to check (probably index.html) into the form.

The second way to use the checker is to call it directly from within a telnet window to grove (the place where you created the webpage). Call this program by typeing:
/class/cgs3063/u3063bnk/bin/htmlcheck
OR
~/../u3063bnk/bin/htmlcheck
This must be typed while in your public_html directory (the same place where your index.html file is, so you can type that command right after using pico). The program will check through your index.html file. If you have a file other than index.html that you would like to check (which you will later) then use the command
/class/cgs3063/u3063bnk/bin/htmlcheck filename
OR
~/../u3063bnk/bin/htmlcheck filename
where filename is the full name of the file (like main.html) you want to check.
You can also copy this program to your own directory (for convenience) by:
cp /class/cgs3063/u3063bnk/bin/htmlcheck . (notice the . at the end)
while in your public_html directory. This way in the future you will only have to type:
htmlcheck
instead of the other whole long mess.

If the program seems to be stuck (meaning it doesn't display anything for a while) then press the control key and the letter c at the same time to break out of program (and please tell us what happened so that we can see if we can fix the checker).

 
 
Limitations
Note: this program does not catch all errors. The testing on values for attributes is pretty generic and may let invalid things go by. You are not guaranteed a good grade just because the checker did not find any problems. However, it is likely to catch many of the simple errors that can be made. Also, this program is not complete in that it is in what is called "Beta" stage (Beta stage means I think it works, but I haven't tested every possible case and won't guarentee that it works). It may give the wrong output from time to time, but not often (send us an email with your username and a description of what went wrong to c3063qjq The syntax checker does not change your webpage (or any other file) so there isn't really any danger in using it. It can save you many hours of testing your page.

 
 
And beyond ...
For those of you who are interested, feel free to copy and make modifications to the code provided the beginning comments regarding the owner are maintained. This program was written for Perl 5. If you make modifications you think would be of benefit to the class, tell us (email the question account c3063qjq able to run on grove). Also, the "alpha" version is in the same directory as htmlcheck and it has a filename of ec (NOTE: the alpha version is not guarenteed to work correctly, in fact it is not even guarenteed to run at all). htmlcheck may change from time to time as we find ways to improve it. ec could change minute by minute as we test/play with it.

 


Index

1 2 3 4 5 6 7 8 9 10 11