Searching and replacing in strings with boost
published at 11.02.2016 12:41 by Jens Weller
Save to Instapaper Pocket
The next big milestone for my CMS is to actually generate HTML files, and I'm almost there. I'll reach it in the next two weeks, most code is written, just a little bit of refactoring is needed. This blog post is about searching and replacing in strings. As I started last week with implementing the functionality, that turns the data in my CMS into an HTML website.
There needs to be a lot of text transformed, in order to turn a shared structure like a cross page layout into a single, special HTML file, one of those transformations is, to replace the internal links with the correct links. A link to a different page in the same website cannot be represented as a text link, instead it is represented by a linkid, which corresponds to the Page it points to. This is to have still the correct link, if the page is renamed or moved.
Finding and replacing link ids
This could be done with a regex, "(\\[linkid\\:)([0-9])+(\\])", and boost offers great support for regular expressions via boost::regex and boost::xpressive. While boost::regex is the library that was the blueprint for std::regex, boost::xpressive allows you to create regular expressions as C++ code, they are then checked at compile time! If my search pattern would be slightly more complex, I would have used boost::xpressive to search and replace my links in the HTML text.
But as my search pattern is not really complex, and I am actually could be searching for "[linkid:", there is an alternative: boost::boyer_moore and boost::boyer_moore_horspool. For my use case, boyer_moore_horspool is better, as it has less inner complexity then boyer_moore, and the string I am searching is short. For both algorithms it is true, that the search gets faster with a longer search term. Both are dedicated string search algorithms, and they share the same interface, the constructor takes a pair of iterators for the search pattern, the operator() a pair of iterators for the search able text. The call operator also returns an iterator pointing to the first occurence of the pattern. Both are searching for a string in another string, there is no flexibility in this pattern like in a regular expression.
As the search object is constant after its creation, I use for each pattern one search object, which is given as a reference to the function doing the replacement:
std::string replaceLinkIdsWithLinks(const std::string& text,const LinkHandler& links,const boost::algorithm::boyer_moore_horspool< std::string::const_iterator > &searcher) { std::string texthtml = text; boost::container::flat_set< std::string > linkids; auto it = searcher(text.begin(),text.end()); while(it != text.end()) { it = std::next(it,8); auto end = std::find_if_not(it,text.end(),[](const char& c){return std::isdigit(c);});
if(it != end) linkids.insert(std::string(it,end)); it = searcher(end,text.end()); } for(auto& id: linkids) { auto link = links.getLink(std::stoi(id)); boost::algorithm::replace_all(texthtml,"[linkid:"+id+"]",link.empty()?"/":link); } return texthtml; }
Unlike last weeks blog entry, I do use a raw loop for the search. I am not sure if it could be replaced by any STL algorithm. Once a pattern has been found, I need to advance the iterator to the beginning of the number, the actual linkid which I need to extract to look up its link in the LinkHandler. I'll simply find the end of the number with find_if_not, and insert the found number into a flat_set.
Replacing is rather simple, I look up the link for each id, and then use replace_all. If the link is not found, the / should ensure that the link stays valid and that the pattern is fully replaced.
boost::algorithm
I quickly want to mention, that there is much more to find in boost::algorithm and boost::string_algo, then the above shows. It contains not only the search algorithms, but also all C++11 and C++14 algorithms in a C++03 compatible way.
One of my tasks this week was to get boostache to compile, boostache uses only a little bit of C++14, and I'm currently on an older, not supporting C++14 MinGW compiler. In the tests, the C++14 version of mismatch was used, which I could replace with boost::mismatch, to achieve C++11 compatibility.
Join the Meeting C++ patreon community!
This and other posts on Meeting C++ are enabled by my supporters on patreon!