A third way to use boost::serialization

published at 08.09.2015 15:03 by Jens Weller

The 10th part of my series about writing applications with Qt and boost is about using boost::serialization. The last part was about how to create the basic structure for a project with boost::filesystem, and how to use boost::filesystem to index folders. But there is a lot of data that just isn't able to be represented as single files, how to store it?

The Video, if you rather listen then read:

Originally I planned to use a database, as I already have some code handling the SQL queries nicely for me and most of my other applications currently also use this to store their data. Thats why most of my classes from the first day on had an id field, just to enable them to refer to an instance stored in a database. But then, if I could get around using a database, by simply just storing my data in a file, things would be easier, and my code wouldn't need to be scattered with SQL queries. If I couldn't find a reasonable approach, I still could opt for a database anyways.

boost::serialization

While other serialization libraries exist, boost brings its own for quite some time now. I have been using it years ago, but it took some time to get used again to its way of doing things. Maybe you want to have a look at the two ways offered by the documentation to put boost::serialization to work, intrusive and non-intrusive. I already spoiled you with the title, here is what I don't like about both ways:

So, I did find a third way to do things differently. Its not the magic silver bullet, doesn't do any reflection and is intrusive. At the end, for each new member in a class, all you have to do, is adding it to a macro. Setting up a new type for serialization is adding a macro, and if the class is derived, adding another line. Thats all, you're done. To achieve this, I first have to introduce you to my way of being intrusive: TUPLE_ACCESS:

//usage:
TUPLE_ACCESS(name,pos)
// TUPLE_ACCESS Macro Implementation
#include <boost/preprocessor/facilities/overload.hpp>
#include <boost/preprocessor/punctuation/comma_if.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/variadic/to_seq.hpp>

#define BOOST_PP_VARIADICS 1
#define TIE_ELEMENT(TE) TE
#define TIE_MACRO(r, data, i, elem) BOOST_PP_COMMA_IF(i) TIE_ELEMENT(elem)
#define TIE(...) access::tie( BOOST_PP_SEQ_FOR_EACH_I(TIE_MACRO, _, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) )
#define TUPLE_ACCESS(...) auto tuple_access() -> decltype( TIE(__VA_ARGS__) ){ return TIE(__VA_ARGS__);}
#include <tuple>
namespace access = std;

This macro adds a method called tuple_access() to each class where it is used. This method will simply return a tuple with references to the members in the variadic macro arguments. This is done via tie, I use a namespace alias to be able to switch between boost and the STL. In some cases boost libraries don't support STL Types, e.g. shared_ptr is only in the boost flavor serializable out of the box in boost::serialization. The 'magic* behind TUPLE_ACCESS is driven by the great boost preprocessor library.

So, my serialization code requires that a type has a member called tuple_access() returning a tuple like type, with references to the members being serialized. The type it self does not know anything about being serialized, it only has to provide this easy interface. My solution then build up on the non intrusive way to use boost::serialization.

Next, the actual part of serialization, I use the non intrusive solution obviously:

// serializing a non derived type
template<class Archive>
void serialize(Archive& ar, Type &t, const unsigned int )
{
    auto tpl =t.tuple_access();
    fusion::for_each(tpl,fusion_helper(ar));
}
//serializing a derived type
template<class Archive>
void serialize(Archive& ar, Type &t, const unsigned int )
{
    ar & boost::serialization::base_object(t);
    auto tpl =t.tuple_access();
    fusion::for_each(tpl,fusion_helper(ar));
}

This code is driven by boost::fusion, I use for_each to serialize every type in the tuple! This works by the simple template fusion_helper:

template< class Archive >
class fusion_helper
{
    Archive& ar;
public:
    explicit fusion_helper(Archive& ar):ar(ar){}
    template< class T >
    void operator()( T&t)const
    {
        ar & t;
    }
};

Obviously this needs to be done for every serializeable type. So, I actually have those functions wrapped into two macros: SERIALIZE_TYPE and SERIALIZE_DERIVED_TYPE:

namespace boost { namespace serialization{
SERIALIZE_TYPE(Module) SERIALIZE_DERIVED_TYPE(TextElement,Module)
}}

For normal types after SERIALIZE_TYPE the system is setup, for derived types, there is one more thing to do: the type nees to be registered with the archive, once, before the actual serialization starts:

template< class Archive >
inline void registerTypes(Archive& ar)
{
    ar.template register_type< TextElement >();
}

This is already all you need to serialize your own simple types. I had to change in some places from std::shared_ptr to boost::shared_ptr, as serialization currently can't handle the standard shared_ptr. Also std::map is supported, but not flat_map or flat_set from boost. I use this in many places, so I copied some of the serialization code for maps and sets and replaced std::map with flat_map, plus the same thing for flat_set. Which works, I'm not sure if its the perfect and correct way, but this is the header containing the required code to serialize flat_maps and sets.

Another issue is, that when you serialize a shared_ptr, serialization does this very well, but id does not recognize if you serialize a pointer which you obtained via shared_ptr::get as being held by a smart pointer.

What still is missing, is the actual code doing the serialization. All the code needed is hidden in one cpp file, so that only one class has to access it: Serializer.

struct Serializer
{
    explicit Serializer(DocumentTreeItem::item_t& doc);
    void save();
    void load();
protected:
    DocumentTreeItem::item_t& doc;
    Document* document;
    std::string path;
    size_t t_dir,t_page;
    void loadDir(boost::archive::text_iarchive &iar, DocumentTreeItem::item_t& dir,bool first = false);
};

To load or save a document, the position in the tree is needed, and also the path to the archive. The size_t variables are the typeid::hash_code values for Dir and Page. loadDir needs to load the tree from the archive.

First a quick look at save:

void Serializer::save()
{
    int version = 1;
    std::ofstream out(path.c_str());
    boost::archive::text_oarchive oar{out};
    registerTypes<boost::archive::text_oarchive>(oar);
    oar << version;
    Document& d = *document;
    oar << d;
    oar << *document->getLayouts();
    auto web = doc->getChild(document->getWebroot_index());
    int cnt = web->childCount();
    oar << cnt;
    TreeVisitor<SerializationVisitor<boost::archive::text_oarchive>> treevisitor([&oar](const DocumentTreeItem::item_t& item){if(item->type_id() == typeid(Dir).hash_code()){int c = item->childCount(); oar << c;}});
    SerializationVisitor<boost::archive::text_oarchive> sv(oar);
    treevisitor.visit(web,sv);
    oar << *document->getLists();
}

Currently, I save the data in text_archives, first a call to the registerTypes template function is needed, to fully set up the code (this saves me from having to call register_type<T> twice, once for loading and once for saving). The root or web node is not being serialized, only its children, and hence the count of children first. I use a TreeVisitor class to visit every node in the web-tree, the SerializationVisitor class does the actual serialization. TreeVisitor has a call back, which is called after each tree node is visited.

Loading this is a bit more interesting:

void Serializer::load()
{
    std::ifstream in(path.c_str());
    boost::archive::text_iarchive iar{in};
    registerTypes<boost::archive::text_iarchive>(iar);
    /*int version =*/ readValue< int >(iar);
    Document& d = *document;
    iar >> d;
    LayoutItem li = readValue< LayoutItem >(iar);
    DocumentTreeItem::item_t web = doc->emplace_back(FixedDir("web"));
    loadDir(iar,web,true);
}

Ok I lied. Reading values from serialization usually requires declaring first, and then reading them. I wrote a simple read function which deals with this boilerplate and simply reads and returns a value from an archive:

template< class T, class Archive >
T readValue(Archive& ar)
{
    T t;
    ar >> t;
    return t;
}

There is also a bit fancier version not requiring the type in the template from Manu Sánchez. There are some cases, where I pass the just read value to template method creating a tree node, then the fancy version won't do. This is the case in loadDir:

void Serializer::loadDir(boost::archive::text_iarchive &iar, DocumentTreeItem::item_t &dir, bool first)
{
    DocumentTreeItem::item_t itemDir=dir;
    if(!first)//root directory isn't written to format
        itemDir = dir->emplace_back(readValue< Dir >(iar));
    int count = readValue< int >(iar);
    for(int i =0; i < count; ++i)
    {
        size_t tid = readValue< size_t >(iar);
        if(tid == t_dir)
            loadDir(iar,itemDir);
        else
            itemDir->emplace_back(readValue< Page >(iar));
    }
}

When reading the tree back in, there is no tree. My tree types have no idea what a tree looks like, and I also did not want to make the tree it self serializable in some way. This means that I simply have to read in the format created by TreeVisitor in this method. After loadDir there still needs to be done some reading from the archive and after its done, some housekeeping: restoring signals and some data that I choose to store in a way that isn't serializable.

I'll be at CppCon, and I plan to give two lightning talks (serialization and integrating the texteditor) and a open content session about my CMS.