使用DOM4J遍历文档

jjui 10年前

dom4j提供了几种不同的选项用于遍历Document对象和它的子对象。

Iterator,Lists和Index-Based Access

例如，输出Element所有子元素location属性的属性值：

public void outputLocationAttributes(Element parent) {      for(Iterator it = parent.elementIterator(); it.hasNext();){          Element child = (Element) it.next();          String value = child.attributeValue("location");          if(value == null){              System.out.println("No location attribute");          }else{              System.out.println("Location attribute value is " + value);          }      }  }

注意在这个例子中，我使用了elementIterator()方法，此工具方法返回List列表的一个java.util.Iterator，该list列表是由elements()方法返回的。如果你不想用Iterator接口，想使用基于索引的访问，那么你可以使用nodeCount()和node()方法：

public void outputLocationAttributes2(Element parent) {      for(int i=0;i<parent.nodeCount();i++){          Node node = parent.node(i);          if(node instanceof Element) {              Element child = (Element) node;              String value = child.attributeValue("location");              if(value == null) {                  System.out.println("NO location attribute");              }else{                  System.out.println("Location attribute value is " + value);              }          }      }  }

XPath

dom4j有一个XPath接口，该对象由DocumentFactory中的createXPath()方法或DocumentHelper中的createXPath()方法创建。dom4j的XPath接口独特之处在于：它可以通过XPath表达式对结果列表进行排序，不管是Node对象的列表（sort方法），还是一个表达式的结果（有两三个参数的selectNodes()方法）。

示例，见下面xml文档：

<?xml version='1.0" encoding="UTF-8"?>  <books>      <book>          <title>Java &amp; XML</title>          <pubDate>2006</pubDate>      </book>      <book>          <title>Learning UML</title>          <pubDate>2003</pubDate>      </book>      <book>          <title>XML in a Nutshell</title>          <pubDate>2004</pubDate>      </book>      <book>          <title>Apache cookbook</title>          <pubDate>2003</pubDate>      </book>  </books>

如果你想根据出版日期将书名列表进行排序，你可以创建两个单独的XPath表达式，来获取book元素，并对它们一一进行排序，然后像这样使用它们，如下所示：

package javaxml3;    import java.io.File;  import java.util.Iterator;  import java.util.List;  import org.dom4j.Document;  import org. dom4j.DocumentHelper;  import org.dom4j.Element;  import org. dom4j.XPath;  import org.dom4j.io.SAXReader;    public class SortingXPath{      public static void main(String[] args) throws Exception {          Document doc = new SAXReader().read(new File("books.xml"));          XPath bookPath = DocumentHelper.createXPath("//book");          XPath sortPath = DocumentHelper.createXPath("pubDate");          List books = bookPath.selectNodes(doc,sortPath);  //sortPath是用于排序的XPath          for(Interator it = books.iterator();it.hasNext();){              Element book = (Element) it.next();              System.out.println(book.elementText("title");          }      }  }

这是按照书名升序输出的，从Learning UML开始，直至Java & XML结尾。这里并不有提供按降序排列的机制。相反，你可以使用Java.util.Collections类的reverse()静态方法转换顺序。带三个参数的selectNodes()方法删除了结果列表中出现重复值的Node对象（第三个参数是true，如果是false的话，重复的Node对象就不会被删除），如果调用它筛选上面例子中的节点，代码可以写成这样：

List books = bookPath.selectNodes(doc,sortPath,true);

这样就只输出三个书名，Apache Cookbook将会排除在外，因为它和Learnig UML的出版日期相同。

除XPath类之外，Node接口有一些方法，你可以简单的传入String给其中的一个方法来计算XPath表达式的值。Node接口的XPath特定方法如下：

public interface Node{      List selectNodes(String xpathExpression);      Object selectObject(String xpathExpression);      List selectNodes(String xpathExpression,String comparisonXPathExpression);      List selectNodes(String xpathExpression,String comparisonXPathExpression,boolean removeDuplicates);      Node selectSingleNode(String xpathExpression);      String valueOf(String xpathExpression);      Number numberValueOf(String xpathExpression);      boolean matches(String xpathExpression);  }

对于后台实现而言，一般会使用XPath类求表达式的值，然后传递给这些方法。因为这些方法处理String，一般情况下，每次调用这类方法都将会创建一个新的XPath对象。因此，如果你想对一个相同的XPath表达式进行重复求值，XPath类提供了一个较好的方式，这样就只对你的表达式进行一次编译。此外，这些方法不能处理命名空间，变量，或自定义函数，如果这些功能是必要的，XPath类是你唯一的选择。但是，这并不意味着这些方法是无用的。实际上，它们使用起来非常方便，并且代码量很小。

还有一些Node接口中的方法很值得一提。比如说，getPath()方法和getUniquePath()方法返回一个XPath表达式，这个表达式是用来求节点列表中的值，其中包含了当前的节点。getUniquePath()方法比getPath()方法更进了一步，它添加了索引，以确保XPath表达式只对这一节点求值。除了不带参数的方法外，getPath()方法和getUniquePath()方法都重载了，并接收一个ELement元素，在这种情况下，将会产生一个相对的XPath表达式，从一个传递的ELement到当前的节点。

使用DOM4J遍历文档

相关经验

目录