Validator.nu is validation 2.0.
RELAX NG validation—XML syntax and Compact Syntax
Schematron 1.5 validation (standalone schemas only—ISO Schematron or Schematron embedded in RELAX NG are not supported)
NVDL-driven validation.
XML 1.0 and HTML5 parsing.
Validator.nu does not check for XML 1.0 validity constraints. That is, DTD validation is not performed.
Validator.nu does not perform the duties of a “validating SGML parser” as defined in ISO 8879. In fact, this service does not have any SGML functionality at all. In particular, the HTML 4.01 support uses the HTML5 parser with some additional error conditions.
Validator.nu has two facets: generic (complex UI) and (X)HTML5 (simple UI).
Enter the URL (http
, https
or data
IRI to be
exact) of the document you want to validate in the field labeled
“Document” and submit the form. That’s all it takes in most
cases.
In the (X)HTML5 facet, the parser and the schema will be chosen
based on the HTTP Content-Type
of the document. In the
generic facet, the parser will be chosen based on the HTTP
Content-Type
and a preset schema will be chosen based on
the root namespace (for XML) or the doctype (for text/html
).
For simplicity, the HTML5 facet only shows UI for validation by URL. Validation by text area and by file upload are available in the generic facet.
Here are bookmarklets:
There is a command-line script that uploads documents from the local filesystem to the (X)HTML5 validator. Integration into vim is available.
When the field for schemas is left empty, the validator will try to
choose a schema on its own. If you are not happy with the guessed
preset, you can specify a schema either by selecting a preset or by
entering a space-separated list of schema URLs (http
,
https
or data
IRIs). In addition to actual schemas, you may use
certain special URLs to invoke checkers
that seem like special schemas but aren’t actually implemented as
schemas.
If the automatic choice of parser does not work for you, you can
choose the parser manually. The choice of parser affects the HTTP
Accept
request header that is sent.
When the lax option is set, text/html
, text/xsl
and text/plain
are allowed as XML content types and
text/plain
is allowed as an HTML content type and, if
the URL ends with .rnc
, as a Compact Syntax content
type. Also, in the lax mode the US-ASCII default for text/*
XML types is not enforced.
Normally, schemas using the RELAX NG XML syntax, Schematron schemas
and the XML documents to be validated are expected to be served
using an XML content type. Schemas using the RELAX NG Compact Syntax
are expected to be served using application/relax-ng-compact-syntax
content type. (The unregistered application/vnd.relax-ng.rnc
content type is also understood.) HTML documents are expected to be
served as text/html
.
When the “Show Image Report” checkbox is set, a report concerning the textual
alternatives of img
elements in the XHTML namespace is shown for accessibility
review.
You may check the “Show Source” checkbox to show the decoded source of the document being checked. Please note that the source may not be shown in its entirety if the parser encounters a fatal error. Moreover, the show source feature shows the decoded Unicode source. Erroneous byte sequences in the original source and characters that would render the validator output as non-conforming (e.g. U+0000) are not represented faithfully.
If you want to create you own alternative mode of input or want to call Validator.nu (or your own local copy) from within your own application, there is a RESTful Web service API. In addition to the modes of input that work from HTML forms, you can also POST the document to be checked as an HTTP entity body. In addition to the default HTML output, the messages are also available as XHTML, XML, JSON, GNU error format and plain text.
HTML5 (text/html
-compatible content models)
HTML5 with ARIA (unendorsed integration prototype)
Mike(tm) Smith has generated documentation for this schema.
XHTML 1.0 Strict with IRI support. Generally suitable for use HTML 4.01 Strict checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes.
XHTML 1.0 Transitional with IRI support. Generally suitable for use HTML 4.01 Transitional checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes.
XHTML 1.0 Frameset with IRI support. Generally suitable for use HTML 4.01 Frameset checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes. Do not use. :-)
XHTML5 (XML-compatible content models)
XHTML5 with ARIA (unendorsed integration prototype), SVG 1.1, MathML 2.0 and holes for OpenMath, RDF and Inkscape cruft.
XHTML 1.0 (not 1.1), SVG 1.1 and MathML 2.0 with IRI support.
XHTML 1.0 (not 1.1), Ruby, SVG 1.1 and MathML 2.0 with IRI support.
A schema for XHTML Basic with IRI support. Suitable for use with the HTML parser.
SVG 1.1 Full with IRI support (Inkscape cruft not permitted).
The service supports a few special pseudo-schema URIs that map to checkers written in a Turing-complete programming language.
http://c.validator.nu/table/
Checks (X)HTML table integrity. The current implementation should be considered a prototype that has not yet been updated to match the latest spec language for HTML5. (See more detailed discussion.)
http://c.validator.nu/nfc/
Checks that constructs in the document tree are in the Unicode Normalization Form C and don’t start with a “composing character”. Using this pseudo-schema also enables normalization checking of source text. (See more detailed discussion.)
http://c.validator.nu/text-content/
Checks the text content of the (X)HTML5 meter
, progress
and time
elements for conformance. (This is a prototype
with liberties taken.)
http://c.validator.nu/unchecked/
Warns about RDF, OpenMath and Inkspace holes and about the use of
version="1.0"
in SVG.
http://c.validator.nu/usemap/
Checks the usemap
attribute for referential integrity.
http://c.validator.nu/all/
Shorthand for http://c.validator.nu/table/
http://c.validator.nu/nfc/ http://c.validator.nu/text-content/ http://c.validator.nu/unchecked/ http://c.validator.nu/usemap/
.
http://c.validator.nu/all-html4/
Shorthand for http://c.validator.nu/table/
http://c.validator.nu/nfc/ http://c.validator.nu/unchecked/ http://c.validator.nu/usemap/
.
http://c.validator.nu/debug/
Dumps parse events as warnings.
Your server cannot properly deal with an Accept
header that does not have */*
in it. Chances are that
you are using Apache 1.3, PHP and MultiViews together. MultiViews
thinks the type of your page is application/x-httpd-php
,
which isn’t in the Accept
header. Apache 2 does not
have this problem.
No, Validator.nu does not give badges.
I have observed that once people are given badges they start to feel entitled to the badges and become hostile if the validation service is changed so that some documents that previously were proclaimed valid no longer are. I do not want to deliberately incite an opposition to bug fixes. I know some of the schemas are not as tight as the corresponding spec prose. If I make them tighter, consider it a bug fix. Moreover, the HTML 5 spec is still changing, so the schema will change as well. Finally, I may (and even intend to) change the namespace associations of preset schemas in the future.
In addition to the problem with changing the validator after badges have been awarded, badges don’t provide value to the readers of validated pages. Validation is a tool for you as a page author—not something your readers need to verify. However, if you are writing about Web authoring and want to refer others to Validator.nu, please, by all means feel free to link to Validator.nu.
By the time Ruby on Rails hit everyone’s radar, this project was already underway. However, Ruby would still have been a bad choice had I considered it seriously earlier. Ruby lacks a solid Unicode infrastructure. I’ve already been in a situation when I had to stop writing app code and spend time writing the very basics Unicode infrastructure. I don’t want to be in that situation again. Ruby lacks solid XML infrastructure as well.
I chose Java over Python for three reasons: SAX, Jing and more experience with Java. Apart from Java feeling like a more secure choice because I had more experience with it, the choice between Java and Python also comes down to infrastructure. Having a platform-wide unified way for plugging together XML tools is extremely important when what you are doing entails plugging together XML tools efficiently.
Java is in a unique position when it comes to XML tool infrastructure. Java has a lot of XML-related libraries available and they pretty much all plug into the same interface. Not only is there a platform-wide XML API, it also happens to be one of the most complete and correct of the XML APIs around. From the point of view of RELAX NG, Java being the language Jing is written in is an extremely important consideration. Jing is a seriously good piece of software. Moreover, Java is the native language of the extensibility interface for RELAX NG datatype libraries.
While I’m on a soap box, I should mention that ICU4J is a seriously good piece of software, too, and having Java’s notion of Unicode frozen as UTF-16 from to dawn of time until eternity is very important considering the stability of infrastructure. It is a horribly bad idea that the meaning of Python programs change (due to datatypes changing underneath) depending on how the interpreter was compiled. Unicode is optimized for 16-bit units. The stability of sticking to UTF-16 in RAM everywhere outweighs the theoretical purity of UTF-32 in RAM. (On disk and network, use UTF-8, of course.)
I do want to make the validator functionality available to applications that are not written in Java, though. This is why Validator.nu has a Web service interface that can be used either with the instance running at validator.nu or with a your private instance running at localhost. I encourage you to write a wrapper library for the Web service in your favorite programming language.
I think DTDs are bad in four ways:
DTDs pollute the document with schema-specific syntax. Since the document itself declares the rules, the question on answered by DTD validation is not the question that should be asked. DTD validation aswers the question “Does this document conform to the rules it declares itself?” The interesting question is “Does this document conform to these rules?” when the person who asks the question chooses the rules the question is about.
DTDs mix a validation mechanism, an inclusion mechanism and an infoset augmentation mechanism. The inclusion mechanism is mainly used for character entities, which solve (but only if the DTD is processed and processing it is not required!) an input problem by burdening the recipient instead of keeping input matters between the editing software and the document author.
DTDs aren’t particularly expressive.
DTDs don’t support Namespaces in XML.
I hope providing an online validation service for RELAX NG removes the excuse that DTDs are needed for online validators.
“Validation” and “validator” in the name and the user interface of the service refer to the ISO/IEC FDIS 19757-2 definition of “validator” (which performs validation), to the Schematron “validation” function (which is performed by a validator), and to the HTML 5 definition of “validator”.
Schemas for XHTML 1.0 are used for HTML 4.01, because XHTML
1.0 is supposed to be a reformulation of HTML 4.01 in XML. However,
there are some subtle spec bugs introduced in the reformulation.
For this reason, some errors for HTML 4.01 are wrong. For example,
XHTML 1.0 (in the DTD) forbids the name
attribute on
the form
element, although it is allowed in HTML 4.01.
Please refer to the bug tracker for other known issues and for ideas for future development.
The preferred forum for discussing issues related to using the (X)HTML5 validator is the WHATWG Help mailing list. The preferred forum for discussing issues related to implementing (X)HTML5 validators in general and this on in particular is the WHATWG Implementors mailing list. Bugs should be reported to Validator.nu Bugzilla.
ID/IDREF/IDREFS checking in RELAX NG is enabled for the benefit of those who use their own schemas and expect this feature to work. However, the preset schemas do not use RELAX NG ID/IDREF/IDREFS features, because the checking isn’t precise enough (cannot require that the referent is of a certain type) and using these features places really annoying restrictions on the schemas.
Comments are not exposed to the validation layer and, therefore, cannot be matched in Schematron.
The document is validated independently (but concurrently) against each schema. The Schematron validators do not see IDness assignments from the RELAX NG validators.
Embedded Schematron is not supported.
xml:id
processing is performed. Also, the
attribute id
in no namespace is given IDness unless the
host element is a CML element. This means that both xml:id
and (X)HTML id
are matched by the XPath id()
function. SVG 1.2 IDness rules are not honored.
The following datatype libraries are supported:
The RELAX
NG DTD Compatibility library
(http://relaxng.org/ns/compatibility/datatypes/1.0
)
The W3C XML
Schema Datatypes library
(http://www.w3.org/2001/XMLSchema-datatypes
)
RELAX NG
Datatype Library for HTML5 Datatypes
(http://whattf.org/datatype-draft
) This is not a
stable library, so you should not rely on it at this time.
The HTML parser emits
parse events as if it was parsing an equivalent XHTML flavor
document. Therefore, the schemas should assume lowercase element
names in the XHTML namespace and attributes in no namespace (except
the lang
attribute maps to
xml:lang
).
The HTML 4.01 parsing mode does not use an SGML parser. Instead, the HTML5 parser is used in an HTML 4.01 compatibility mode. The names of boolean attributes are repeated as values for compatibility with XHTML 1.0 schemas. (This does not happen in the HTML5 mode.)
The code is hosted on GitHub. Please see the the build instructions.
I would like to thank the Mozilla Foundation and the Mozilla Corporation for funding this project.
I would like to thank James Clark for writing Jing and for championing RELAX NG and XML. I would also like to thank everyone who tested the development builds, the writers of test cases and everyone who has developed library code and schemas that the service uses.
Mike(tm) Smith has contributed numerous fixes and updates to HTML5 validation and is the most active developer of the project as of 2014.
Philip Jägenstedt contributed Microdata validation support.
The XHTML 1.0 schemas were originally written by James Clark and have been improved by Petr Nálevka.
fantasai designed the (X)HTML5 schema framework, wrote the (X)HTML5 Core schemas and helped along the way when I added features.
JavaScript bits, the favicon and a lot of bug reports were contributed by Simon Pieters.
The schemas for RELAX NG and XSLT were written by James Clark.
The principal author of the schema for DocBook is Norman Walsh.
The SVG schemas come from the W3C.
The MathML schema was written by Yutaka Furubayashi.
Test cases written by fantasai, Anne van Kesteren and Christoph Schneegans were very useful in developing this service.
This product includes software developed by The Apache Software Foundation (http://www.apache.org/).
This product uses The SAXON XSLT Processor from Michael Kay.
Focuses on HTML, XHTML, WML. Uses SGML DTDs and custom code for HTML. Uses XSD and custom code for XHTML. Recently added support for RSS and Atom, but that feature is still in flux.
Validates using the XSD implementation of XHTML 1.0.
Uses RELAX NG and Schematron for validating XHTML and HTML. (The XHTML 1.0 schemas offered here as presets are based on the schemas used in Relaxed.)
DTD-based SGML and XML validation.
Checks Atom and RSS feeds. Uses Python as the schema language. :-)
Checks CSS style sheets.
DTD-based SGML and XML validation.
These terms only apply to the service hosted on the validator.nu
domain. If you arrived at this page from another instance of the software run by someone else, such as the W3C, that instance may have different terms.
If you do not accept these terms, do not use the service. You can run your own copy of the software under the applicable Open Source licenses without having to agree to these terms.
These terms may be updated from time to time. There are no email notifications of updates in order not to have to collect your email address.
The software instance on validator.nu
in operated by Henri Sivonen on Gandi's infrastructure. The point of contact in all matters related to the deployment instance on validator.nu
is Henri Sivonen. (For matter relating to the validator software itself rather than the specific deployment instance on validator.nu
, please refer to GitHub issues of the software project.)
There is absolutely no warranty or guarantee of level of service. If you want uptime guarantees, please run your own copy of the software. The service may be discontinued at any time without prior notice.
The service at validator.nu
is meant for validating public Web pages (GET request mode) and for validating drafts of pages that are being prepared to be published on the Web (POST request mode). By design, the service does not ask for passwords to be able to validate pages that are behind login. You must not grant the validator instance at validator.nu
special access to your site e.g. by IP address. If you wish to validate behind-login or otherwise private pages, please run your own copy of the validator software. Do not upload sensitive data as POST request. (E.g. do not upload real confidential records within your HTML if your a developing an HTML UI that deals with such data.)
You must not use the service to validate illegal content or engage in activity that has the appearance of botnet activity.
Do not place excessive load on the service. It's fine to use the API from the content management system of your personal blog. If you have a large blog hosting service, please run your own copy of the software. You must not use a browser extension that sends the content of every page you browse to the validator. If you want to see a validity indicator for every page, please run your own copy of the validator software.
For HTTP requests, the service is typically configured to log non-personally-identifiable usage information including the virtual server host name accessed, the path accessed, the HTTP method, the response code, the number of bytes transferred, the access time, and the User-Agent
header your client software sent (i.e. the name and version of your Web browser).
In successful normal operation, your IP address is not logged in the clear. An anonymized hash thereof may be logged even during normal operation with a keyd hash function whose key is kept in RAM and discarded from time to time to make general usage statistic analysis possible while making it infeasible to reverse the hash by brute force even for a small search space such as the space of IPv4 addresses.
If the service encounters an error, it may log the error and include your IP address and/or the URL being validated in the logged error event. These logs are deleted from time to time after fixing the errors or ignoring them as unactionable. More general IP address logging may be temporarily turned on to investigate abuse of the service. Afterwards, the IP addresses will be anonymized as described in the above paragraph. However, IP addresses deemed to have caused abusive traffic may be retained as part of a blocklist.
The URLs of the pages you validate may be kept for a limited time to understand abuse of the service. (Since anyone can validate anyone else's public Web page and you are only allowed to validate public pages by URL, the URLs are not considered personally identifying of the person asking for the validation.)
These logs are meant to be visible to Henri Sivonen only, but there's no technical way for him to prevent Gandi from gaining access to these logs (though they aren't supposed to look). Aggregate usage statistics may be shared publicly. Government requests may be responded to.
The content of POST requests may be written to a temporary file. While these are deleted after processing the request, in principle they might leave forensically recoverable data on disk until actually overwritten.