acl_cpp 编程之 xml 流式解析与创建-白红宇

acl_cpp 编程之 xml 流式解析与创建

阅读量：6890 次

发布时间：2019-06-27

本文共 9386 字，大约阅读时间需要 31 分钟。

xml 数据格式做为当今WEB开发的重要数据格式之一，应用非常普及，在文章中，专门讲述了 acl 库中是如何实现流式 xml 数据解析的，在 acl_cpp 库中利用 c++ 语言特点对 acl 中的 xml 流式解析进行了进一步封装，从而更加方便用户使用，其中主要涉及到两个类：xml 类和 xml_node 类，现在分别就这两个类的函数功能做一简单介绍。

一、解析过程中的用法

1、 xml 类中的主要方法如下：

/** * 以流式方式循环调用本函数添加 XML 数据，也可以一次性添加 * 完整的 XML 数据，如果是重复使用该 XML 解析器解析多个 XML * 对象，则应该在解析下一个 XML 对象前调用 reset() 方法来清 * 除上一次的解析结果 * @param data {const char*} xml 数据 */void update(const char* data);

/** * 从 XML 对象中取得某个标签名的所有结点集合 * @param tag {const char*} 标签名(不区分大小写) * @return {const std::vector
     
      &} 返回结果集的对象引用， *  如果查询结果为空，则该集合为空，即：empty() == true *  注：返回的数组中的 xml_node 结点数据可以修改，但不能删除该结点， *  因为该库内部有自动删除的机制 */const std::vector
      
       & getElementsByTagName(const char* tag) const;/** * 从 xml 对象中获得所有的与给定多级标签名相同的 xml 结点的集合 * @param tags {const char*} 多级标签名，由 '/' 分隔各级标签名，如针对 xml 数据： *  
        
         
          
          
            text1 
           
          
         ... *  
         
          
           
           
             text2 
            
           
          ... *  
          
           
            
            
              text3 
             
            
           ... *  可以通过多级标签名：root/first/second/third 一次性查出所有符合条件的结点 * @return {const std::vector
          
           &} 符合条件的 xml 结点集合, * 如果查询结果为空，则该集合为空，即：empty() == true * 注：返回的数组中的 xml_node 结点数据可以修改，但不能删除该结点， * 因为该库内部有自动删除的机制 */const std::vector
           
            & getElementsByTags(const char* tags) const;/** * 从 xml 对象中获得所有的与给定属性名 name 的属性值相同的 xml 结点元素集合 * @param name {const char*} 属性名为 name 的属性值 * @return {const std::vector
            
             &} 返回结果集的对象引用， * 如果查询结果为空，则该集合为空，即：empty() == true * 注：返回的数组中的 xml_node 结点数据可以修改，但不能删除该结点， * 因为该库内部有自动删除的机制 */const std::vector
             
              & getElementsByName(const char* value) const;/** * 从 xml 对象中获得所有给定属性名及属性值的 xml 结点元素集合 * @param name {const char*} 属性名 * @param value {const char*} 属性值 * @return {const std::vector
              
               &} 返回结果集的对象引用， * 如果查询结果为空，则该集合为空，即：empty() == true */const std::vector
               
                & getElementsByAttr(const char* name, const char* value) const;/** * 从 xml 对象中获得指定 id 值的 xml 结点元素 * @param id {const char*} id 值 * @return {const xml_node*} xml 结点元素, 若返回 NULL 则表示没有符合 * 条件的 xml 结点, 返回值不需要释放 */const xml_node* getElementById(const char* id) const;

/** * 开始遍历该 xml 对象并获得第一个结点 * @return {xml_node*} 返回空表示该 xml 对象为空结点 *  注：返回的结点对象用户不能手工释放，因为该对象被 *  内部库自动释放 */xml_node* first_node(void);/** * 遍历该 xml 对象的下一个 xml 结点 * @return {xml_node*} 返回空表示遍历完毕 *  注：返回的结点对象用户不能手工释放，因为该对象被 *  内部库自动释放 */xml_node* next_node(void);

2、xml_node 类中的主要方法

/** * 取得本 XML 结点的标签名 * @return {const char*} 返回 XML 结点标签名，如果返回空，则说明 *  不存在标签？xxxx，以防万一，调用者需要判断返回值 */const char* tag_name(void) const;/** * 如果该 XML 结点的 ID 号属性不存在，则返回空指针 * @return {const char*} 当 ID 属性存在时返回对应的值，否则返回空 */const char* id(void) const;/** * 返回该 XML 结点的正文内容 * @return {const char*} 返回空说明没有正文内容 */const char* text(void) const;/** * 返回该 XML 结点的某个属性值 * @param name {const char*} 属性名 * @return {const char*} 属性值，如果返回空则说明该属性不存在 */const char* attr_value(const char* name) const;/** * 遍历结点的所有属性时，需要调用此函数来获得第一个属性对象 * @return {const xml_attr*} 返回第一个属性对象，若为空，则表示 *  该结点没有属性 */const xml_attr* first_attr(void) const;/** * 遍历结点的所有属性时，调用本函数获得下一个属性对象 * @return {const xml_attr*} 返回下一下属性对象，若为空，则表示 *  遍历完毕 */const xml_attr* next_attr(void) const;

/** * 获得本结点的父级结点对象的引用 * @return {xml_node&} */xml_node& get_parent(void) const;/** * 获得本结点的第一个子结点，需要遍历子结点时必须首先调用此函数 * @return {xml_node*} 返回空表示没有子结点 */xml_node* first_child(void);/** * 获得本结点的下一个子结点 * @return {xml_node*} 返回空表示遍历过程结束 */xml_node* next_child(void);/** * 返回该 xml 结点的下一级子结点的个数 * @return {int} 永远 >= 0 */int   children_count(void) const;

上面列出的函数接口比较多，还有一些未列出，用户在用时不免会被这么多接口搞晕，下面就写一个简单的例子说明如何使用这两个类。

#include 
     
      #include "xml.hpp"static void test1(void){	const char *data =		"
      \r\n"		"
      \r\n"		"xmllint\">\r\n"		"]>\r\n"		"
      
       test\r\n"		"	
        -  -->\r\n"		"	
       
        zsx\r\n"		"		
        
         38
        \r\n"		"	
       \r\n"		"
      \r\n"		"
       -->\r\n"		"
      \r\n"		"
      \r\n"		"
      
       test
      \r\n";	acl::xml xml;  // xml 解析器对象定义	xml.update(data);  // 将 xml 数据输入并进行解析	// 根据 xml 标签名获得所有相应的 xml 结点对象	const std::vector
      
       & elements = xml.getElementsByTagName("user");	if (!elements.empty()) {		// 遍历查询结果集		std::vector
       
        ::const_iterator cit = elements.begin();		for (; cit != elements.end(); cit++) {			acl::xml_node *node = *cit;			printf("tagname: %s, text: %s\n", node->tag_name() ? node->tag_name() : "",				node->text() ? node->text() : "");			// 遍历一个结点的所有属性			const acl::xml_attr* attr = (*cit)->first_attr();  // 取得结点的第一个属性			while (attr)			{				printf("test1: %s=%s\r\n", attr->get_name(), attr->get_value());				attr = (*cit)->next_attr();  // 取得结点的下一个属性			}		}	}}

上面的例子中是一次性将 xml 数据传给 acl::xml 解析器进行解析的，当然也可以采用如下的方法：

const char* ptr;	char  buf[2];	ptr = data;	while (*ptr) {		buf[0] = *ptr++;		buf[1] = 0;		xml.update(buf);	}

每次传给xml解析器一个字节的解析效率比较低，这只是展示 acl_cpp 中的 xml 的流式解析器的特点，这对于网络通信中尤其是 HTTP 数据流中针对 xml 数据流的解析比较有帮助。

另外，xml 解析器还给出一个用于遍历所有 xml 结点对象的函数：first_node 和 next_node，通过这两个函数可以获得一个完整的 xml 树的所有结点，示例如下：

acl::xml xml;	...	acl::xml_node* node = xml.first_node(); // 取得第一个 xml 结点	while (node) {		printf("tag: %s\r\n", node->tag_name());		node = xml.next_node(); // 取得下一个 xml 结点	}

不仅 xml 树对象有遍历的功能函数，xml_node 结点对象也有遍历其下一级子结点的功能函数，示例如下：

acl::xml xml;	...	acl::xml_node* node = xml.first_node();  //取得 xml 对象的第一个xml_node 结点	if (node) {		acl::xml_node* child = node->first_child();  // 取得该 xml_node 结点的第一个第一级子结点		while (child) {			printf("child tag: %s\r\n", child->tag_name());			child = node->next_child();  // 取得该 xml_node 结点的下一下第一级子结点		}	}

二、生成 xml 字符串的用法

为了便于生成 xml 对象，acl_cpp 的 xml 模块增加了相应的函数接口用于生成 xml 数据流，下面介绍如何生成 xml 数据流。

1、在 xml 类中相关函数接口：

/** * 创建一个 xml_node 结点对象 * @param tag {const char*} 标签名 * @param text {const char*} 文本字符串 * @return {xml_node*} 新产生的 xml_node 对象不需要用户手工释放，因为在 *  xml 对象被释放时这些结点会自动被释放，当然用户也可以在不用时调用 *  reset 来释放这些 xml_node 结点对象 */xml_node& create_node(const char* tag, const char* text = NULL);/** * 获得根结点对象 * @return {xml_node&} */xml_node& get_root();

在 xml 解析器中，有一个虚拟的 xml 根结点，这个结点本身不存任何 xml 数据，但所有的 xml_node 结点都属于这个根结点的子结点。

2、在 xml_node 类中相关函数接口：

/** * 添加 XML 结点属性 * @param name {const char*} 属性名 * @param value {const char*} 属性值 * @return {xml_node&} */xml_node& add_attr(const char* name, const char* value);/** * 设置 xml 结点的文本内容 * @param str {const char*} 字符串内容 * @return {xml_node&} */xml_node& set_text(const char* str);

/** * 给本 xml 结点添加 xml_node 子结点对象 * @param child {xml_node*} 子结点对象 * @return {xml_node&} return_child 为 true 返回子结点的引用， *  否则返回本 xml 结点引用 */xml_node& add_child(xml_node* child, bool return_child = false);/** * 给本 xml 结点添加 xml_node 子结点对象 * @param child {xml_node&} 子结点对象 * @return {xml_node&} return_child 为 true 返回子结点的引用， *  否则返回本 xml 结点引用 */xml_node& add_child(xml_node& child, bool return_child = false);/** * 给本 xml 结点添加 xml_node 子结点对象 * @param tag {const char* tag} 子结点对象的标签名 * @return {xml_node&} return_child 为 true 返回子结点的引用， * @param str {const char*} 文本字符串 *  否则返回本 xml 结点引用 */xml_node& add_child(const char* tag, bool return_child = false,	const char* str = NULL);

下面举几个简单的例子来说明如何生成 xml 数据流：

acl::xml xml;

acl::xml_node& root = xml.get_root();  // 获得 xml 的根结点	acl::xml_node* node1, *node2, *node11;	// 创建一个 xml_node 结点	node1 = &xml.create_node("test1");	// 给 node1 结点添加属性值	(*node1).add_attr("name1_1", "value1_1")		.add_attr("name1_2", "value1_2")		.add_attr("name1_3", "value1_3");	// 将 node1 做为 xml 根结点的第一个子结点	root.add_child(node1);	// 创建一个 xml_node 结点	node11 = &xml.create_node("test11");	// 给 node11 结点添加属性值	(*node11).add_attr("name11_1", "value11_1")		.add_attr("name11_2", "value11_2")		.add_attr("name11_3", "value11_3");	// 将 node11 做为 node1 根结点的第一个子结点	node1.add_child(node11);	// 创建一个 xml_node 结点	node2 = &xml.create_node("test2");	// 给 node2 结点添加属性值	(*node2).add_attr("name2_1", "value2_1")		.add_attr("name2_2", "value2_2")		.add_attr("name2_3", "value2_3");	// 将 node2 做为 xml 根结点的第二个子结点	root.add_child(node2);	acl::string buf("
     ");	xml.build_xml(buf);  // 生成 xml 数据流，注：在 函数 build_xml 内部对于缓冲区 buf 的处理方式是 append 模式，即如果在 buf 里有数据，build_xml 只是在 buf 原来的数据尾部追加数据而已	printf("xml: %s\r\n", buf.c_str()); // 打印生成的 xml 数据

其实，上面的示例还有一个更加简洁的写法，如下：

acl::xml_node& root = xml.get_root();  // 获得 xml 的根结点	acl::xml_node* node1, *node2, *node11;	// 创建一个 xml_node 结点	xml.get_root()		.add_child("test1", true)  // 因第二个参数为 true，所以 add_child 函数返回新创建子结点的引用			.add_attr("name1_1", "value1_1")  // 给 test1 结点添加属性			.add_attr("name1_2", "value1_2")			.add_attr("name1_3", "value1_3");			.add_child("test11", true)  // 给 test1 结点添加一个标签值为 test11 的子结点				.add_attr("name11_1", "value11_1")  // 给 test11 子结点添加属性				.add_attr("name11_2", "value11_2")				.add_attr("name11_3", "value11_3");				.get_parent()  // 返回 test11 结点的父结点的引用，即返回 test1 结点			.get_parent()  // 返回 test1 结点的引用即返回 xml 的 root 结点		.add_child("test2", true)  // 给 xml 根结点添加 test2 子结点			.add_attr("name2_1", "value2_1")  // 给 test2 子结点添加属性			.add_attr("name2_2", "value2_2")			.add_attr("name2_3", "value2_3");

可以看出，第二种写法更加简洁有效，同时逻辑关系更为清晰，有种一气呵成的感觉，呵呵。当然，读者可以根据自己的习惯使用其中任意一种写法。另外，大家仔细查看 xml_node 类的声明可能会看出，该类的构造函数和析构函数是私有的，这意味着用户不能使用 new 或delete 来手工创建和销毁 xml_node 类对象，同时不能如 acl::xml_node node 这样定义对象，这就说，xml_node 对象只能是由 acl::xml 类对象或 acl::xml_node 类对象来创建，同时对所有 xml_node 类对象的销毁都是在 acl::xml 类对象内部自动完成的，即当 xml 对象销毁时，这些内部动态创建的 xml_node 结点会被自动销毁；如果用户想在 acl::xml 类对象销毁之前提前销毁所有的 acl::xml_node 类对象，则用户可以手工调用 acl::xml类中的 reset() 方法来达到此目的。

使用 xml 的例子在：samples/xml 目录下

acl_cpp 下载：

原文地址：http://zsxxsz.iteye.com/blog/1506643