Advanced schema-aware streaming XML parser (fully typed output for TypeScript!)

尘心凡梦 9年前

来自: https://github.com/charto/cxml

cxml

 

cxml aims to be the most advanced schema-aware streaming XML parser for JavaScript and TypeScript. It fully supports namespaces, derived types and substitution groups. It can handle pretty hairy schema such as GML , WFS and extensions to them defined by INSPIRE . Output is fully typed and structured according to the actual meaning of input data, as defined in the schema.

Introduction

For example this XML:

<dir name="123">      <owner>me</owner>      <file name="test" size="123">          data      </file>  </dir>

can become this JSON (run npm test to see it happen):

{    "dir": {      "name": "123",      "owner": "me",      "file": [        {          "name": "test",          "size": 123,          "content": "data"        }      ]    }  }

Note the following:

  • "123" can be a string or a number depending on the context.
  • The name attribute and owner child element are represented in the same way.
  • A dir has a single owner but can contain many files, so file is an array but owner is not.
  • Output data types are as simple as possible while correctly representing the input.

See theexample schema that makes it happen. Schemas for formats like GML and SVG are nastier, but you don't have to look at them to use them through cxml .

Relevant schema files should be downloaded and compiled usingcxsd before using them to parse documents. Check out the example schema converted to TypeScript .

There's much more. What if we parse an empty dir:

import * as cxml from 'cxml';  import * as example from 'cxml/test/xmlns/dir-example';    var parser = new cxml.Parser();    var result = parser.parse('<dir name="empty"></dir>', example.document);

Now we can print the result and try some magical features:

result.then((doc: example.document) => {        console.log( JSON.stringify(doc) );  // {"dir":{"name":"empty"}}      var dir = doc.dir;        console.log( dir instanceof example.document.dir.constructor );   // true      console.log( dir instanceof example.document.file.constructor );  // false        console.log( dir instanceof example.DirType );   // true      console.log( dir instanceof example.FileType );  // false        console.log( dir._exists );          // true      console.log( dir.file[0]._exists );  // false (not an error!)    });

Unseen in the JSON output, every object is an instance of a constructor for the appropriate XSD schema type. Its prototype also contains placeholders for valid children, which means you can refer to a.b.c.d._exists even if a.b doesn't exist. This saves irrelevant checks when only the existence of a deeply nested item is interesting. The magical _exists flag is true in the prototypes and false in the placeholder instances, so it consumes no memory per object.

We can also process data as soon as the parser sees it in the incoming stream:

parser.attach(class DirHandler extends (example.document.dir.constructor) {        /** Fires when the opening <dir> and attributes have been parsed. */        _before() {          console.log('Before ' + this.name + ': ' + JSON.stringify(this));      }        /** Fires when the closing </dir> and children have been parsed. */        _after() {          console.log('After  ' + this.name + ': ' + JSON.stringify(this));      }    });

The best part: your code is fully typed with comments pulled from the schema!

Related projects

  • node-xml4js uses schema information to read XML into nicely structured objects.

License

The MIT License

Copyright (c) 2016 BusFaster Ltd