parsing - How can I read and parse CSV files in C++?

ID : 10441

viewed : 56

Tags : c++parsingtextcsvc++

Top 5 Answer for parsing - How can I read and parse CSV files in C++?

vote vote

99

If you don't care about escaping comma and newline,
AND you can't embed comma and newline in quotes (If you can't escape then...)
then its only about three lines of code (OK 14 ->But its only 15 to read the whole file).

std::vector<std::string> getNextLineAndSplitIntoTokens(std::istream& str) {     std::vector<std::string>   result;     std::string                line;     std::getline(str,line);      std::stringstream          lineStream(line);     std::string                cell;      while(std::getline(lineStream,cell, ','))     {         result.push_back(cell);     }     // This checks for a trailing comma with no data after it.     if (!lineStream && cell.empty())     {         // If there was a trailing comma then add an empty element.         result.push_back("");     }     return result; } 

I would just create a class representing a row.
Then stream into that object:

#include <iterator> #include <iostream> #include <fstream> #include <sstream> #include <vector> #include <string>  class CSVRow {     public:         std::string_view operator[](std::size_t index) const         {             return std::string_view(&m_line[m_data[index] + 1], m_data[index + 1] -  (m_data[index] + 1));         }         std::size_t size() const         {             return m_data.size() - 1;         }         void readNextRow(std::istream& str)         {             std::getline(str, m_line);              m_data.clear();             m_data.emplace_back(-1);             std::string::size_type pos = 0;             while((pos = m_line.find(',', pos)) != std::string::npos)             {                 m_data.emplace_back(pos);                 ++pos;             }             // This checks for a trailing comma with no data after it.             pos   = m_line.size();             m_data.emplace_back(pos);         }     private:         std::string         m_line;         std::vector<int>    m_data; };  std::istream& operator>>(std::istream& str, CSVRow& data) {     data.readNextRow(str);     return str; }    int main() {     std::ifstream       file("plop.csv");      CSVRow              row;     while(file >> row)     {         std::cout << "4th Element(" << row[3] << ")\n";     } } 

But with a little work we could technically create an iterator:

class CSVIterator {        public:         typedef std::input_iterator_tag     iterator_category;         typedef CSVRow                      value_type;         typedef std::size_t                 difference_type;         typedef CSVRow*                     pointer;         typedef CSVRow&                     reference;          CSVIterator(std::istream& str)  :m_str(str.good()?&str:NULL) { ++(*this); }         CSVIterator()                   :m_str(NULL) {}          // Pre Increment         CSVIterator& operator++()               {if (m_str) { if (!((*m_str) >> m_row)){m_str = NULL;}}return *this;}         // Post increment         CSVIterator operator++(int)             {CSVIterator    tmp(*this);++(*this);return tmp;}         CSVRow const& operator*()   const       {return m_row;}         CSVRow const* operator->()  const       {return &m_row;}          bool operator==(CSVIterator const& rhs) {return ((this == &rhs) || ((this->m_str == NULL) && (rhs.m_str == NULL)));}         bool operator!=(CSVIterator const& rhs) {return !((*this) == rhs);}     private:         std::istream*       m_str;         CSVRow              m_row; };   int main() {     std::ifstream       file("plop.csv");      for(CSVIterator loop(file); loop != CSVIterator(); ++loop)     {         std::cout << "4th Element(" << (*loop)[3] << ")\n";     } } 

Now that we are in 2020 lets add a CSVRange object:

class CSVRange {     std::istream&   stream;     public:         CSVRange(std::istream& str)             : stream(str)         {}         CSVIterator begin() const {return CSVIterator{stream};}         CSVIterator end()   const {return CSVIterator{};} };  int main() {     std::ifstream       file("plop.csv");      for(auto& row: CSVRange(file))     {         std::cout << "4th Element(" << row[3] << ")\n";     } } 
vote vote

81

My version is not using anything but the standard C++11 library. It copes well with Excel CSV quotation:

spam eggs,"foo,bar","""fizz buzz""" 1.23,4.567,-8.00E+09 

The code is written as a finite-state machine and is consuming one character at a time. I think it's easier to reason about.

#include <istream> #include <string> #include <vector>  enum class CSVState {     UnquotedField,     QuotedField,     QuotedQuote };  std::vector<std::string> readCSVRow(const std::string &row) {     CSVState state = CSVState::UnquotedField;     std::vector<std::string> fields {""};     size_t i = 0; // index of the current field     for (char c : row) {         switch (state) {             case CSVState::UnquotedField:                 switch (c) {                     case ',': // end of field                               fields.push_back(""); i++;                               break;                     case '"': state = CSVState::QuotedField;                               break;                     default:  fields[i].push_back(c);                               break; }                 break;             case CSVState::QuotedField:                 switch (c) {                     case '"': state = CSVState::QuotedQuote;                               break;                     default:  fields[i].push_back(c);                               break; }                 break;             case CSVState::QuotedQuote:                 switch (c) {                     case ',': // , after closing quote                               fields.push_back(""); i++;                               state = CSVState::UnquotedField;                               break;                     case '"': // "" -> "                               fields[i].push_back('"');                               state = CSVState::QuotedField;                               break;                     default:  // end of quote                               state = CSVState::UnquotedField;                               break; }                 break;         }     }     return fields; }  /// Read CSV file, Excel dialect. Accept "quoted fields ""with quotes""" std::vector<std::vector<std::string>> readCSV(std::istream &in) {     std::vector<std::vector<std::string>> table;     std::string row;     while (!in.eof()) {         std::getline(in, row);         if (in.bad() || in.fail()) {             break;         }         auto fields = readCSVRow(row);         table.push_back(fields);     }     return table; } 
vote vote

79

Solution using Boost Tokenizer:

std::vector<std::string> vec; using namespace boost; tokenizer<escaped_list_separator<char> > tk(    line, escaped_list_separator<char>('\\', ',', '\"')); for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());    i!=tk.end();++i)  {    vec.push_back(*i); } 
vote vote

65

The C++ String Toolkit Library (StrTk) has a token grid class that allows you to load data either from text files, strings or char buffers, and to parse/process them in a row-column fashion.

You can specify the row delimiters and column delimiters or just use the defaults.

void foo() {    std::string data = "1,2,3,4,5\n"                       "0,2,4,6,8\n"                       "1,3,5,7,9\n";     strtk::token_grid grid(data,data.size(),",");     for(std::size_t i = 0; i < grid.row_count(); ++i)    {       strtk::token_grid::row_type r = grid.row(i);       for(std::size_t j = 0; j < r.size(); ++j)       {          std::cout << r.get<int>(j) << "\t";       }       std::cout << std::endl;    }    std::cout << std::endl; } 

More examples can be found Here

vote vote

51

You can use Boost Tokenizer with escaped_list_separator.

escaped_list_separator parses a superset of the csv. Boost::tokenizer

This only uses Boost tokenizer header files, no linking to boost libraries required.

Here is an example, (see Parse CSV File With Boost Tokenizer In C++ for details or Boost::tokenizer ):

#include <iostream>     // cout, endl #include <fstream>      // fstream #include <vector> #include <string> #include <algorithm>    // copy #include <iterator>     // ostream_operator #include <boost/tokenizer.hpp>  int main() {     using namespace std;     using namespace boost;     string data("data.csv");      ifstream in(data.c_str());     if (!in.is_open()) return 1;      typedef tokenizer< escaped_list_separator<char> > Tokenizer;     vector< string > vec;     string line;      while (getline(in,line))     {         Tokenizer tok(line);         vec.assign(tok.begin(),tok.end());          // vector now contains strings from one row, output to cout here         copy(vec.begin(), vec.end(), ostream_iterator<string>(cout, "|"));          cout << "\n----------------------" << endl;     } } 

Top 3 video Explaining parsing - How can I read and parse CSV files in C++?

Related QUESTION?