c++ - How do I iterate over the words of a string?

ID : 163

viewed : 225

Tags : c++stringsplitc++

Top 5 Answer for c++ - How do I iterate over the words of a string?

vote vote


I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string> #include <sstream> #include <vector> #include <iterator>  template <typename Out> void split(const std::string &s, char delim, Out result) {     std::istringstream iss(s);     std::string item;     while (std::getline(iss, item, delim)) {         *result++ = item;     } }  std::vector<std::string> split(const std::string &s, char delim) {     std::vector<std::string> elems;     split(s, delim, std::back_inserter(elems));     return elems; } 

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':'); 
vote vote


For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream> #include <string> #include <sstream> #include <algorithm> #include <iterator>  int main() {     using namespace std;     string sentence = "And I feel fine...";     istringstream iss(sentence);     copy(istream_iterator<string>(iss),          istream_iterator<string>(),          ostream_iterator<string>(cout, "\n")); } 

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens; copy(istream_iterator<string>(iss),      istream_iterator<string>(),      back_inserter(tokens)); 

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},                       istream_iterator<string>{}}; 
vote vote


A possible solution using Boost might be:

#include <boost/algorithm/string.hpp> std::vector<std::string> strs; boost::split(strs, "string to split", boost::is_any_of("\t ")); 

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

vote vote


#include <vector> #include <string> #include <sstream>  int main() {     std::string str("Split me by whitespaces");     std::string buf;                 // Have a buffer string     std::stringstream ss(str);       // Insert the string into a stream      std::vector<std::string> tokens; // Create vector to hold our words      while (ss >> buf)         tokens.push_back(buf);      return 0; } 
vote vote


For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT > void tokenize(const std::string& str, ContainerT& tokens,               const std::string& delimiters = " ", bool trimEmpty = false) {    std::string::size_type pos, lastPos = 0, length = str.length();     using value_type = typename ContainerT::value_type;    using size_type  = typename ContainerT::size_type;     while(lastPos < length + 1)    {       pos = str.find_first_of(delimiters, lastPos);       if(pos == std::string::npos)       {          pos = length;       }        if(pos != lastPos || !trimEmpty)          tokens.push_back(value_type(str.data()+lastPos,                (size_type)pos-lastPos ));        lastPos = pos + 1;    } } 

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.

It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.

Top 3 video Explaining c++ - How do I iterate over the words of a string?