简体   繁体   中英

How much of a file has been read

I have a program which reads a 10 MByte file and processes the data as the data is being read in 4K chunks. The test usually takes 1 min - 2 min. But there are some instances when the program takes more than 10 min, at which point the test it killed and a core is generated. Following is the code that reads the file:

    string filename("data.out");
    ifstream ifs;
    vector<char> buf(4096);

    ifs.open(filename,  ios::in | ios::binary);
    if (!ifs.is_open()) {
            cout << "ERROR : " << filename << "can't be opened." << endl;
            VERIFY(ifs.is_open());
    }

    while (!ifs.eof()) {
            ifs.read(buf.data(), buf.size());     <======== Line 1
            process_data (buf.data(), ifs.gcount());   <======== Line 2
    }
    ifs.close();

I have two cores that show the program is stuck at Line 1 and Line 2.

Top of bt of core1 at Line 1:

#0  0x00007f942a462175 in std::istream::read (this=0x7fff4ce69de0,
__s=0x9120000 "\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324\324"..., __n=4096) at /home/packages/gcc/4.7/w/gcc-4.7-4.7.2/build/x86_64-linux-gnu/libstdc++-v3/include/bits/istream.tcc:651

Top of bt of core2 at Line 2:

#0  0x00000000004375f3 in std::__addressof<char> (__r=@0x7fa3176391a6: -128 '\200') at /usr/include/c++/4.7/bits/move.h:47
#1  0x0000000000436cd4 in std::vector<char, std::allocator<char> >::data (this=0x7fff346ad770)
at /usr/include/c++/4.7/bits/stl_vector.h:859

Initially, from core1, I thought the issue was with ifs.read() taking a long time. But after the second core, I am thinking the issue might be related to vector::data().

Is there a way I can tell if any part of the file has been read, by inspecting certain fields (eg file offset) stored in ifstream.

I don't like posting dump of large structure, but here it is if someone can shed some light how I can figure out from this dump how much of the 10MB has been read.

(gdb) p ifs
$3 = warning: can't find linker symbol for virtual table for `std::basic_ifstream<char, std::char_traits<char> >' value
{
  <std::basic_istream<char, std::char_traits<char> >> = {
    <std::basic_ios<char, std::char_traits<char> >> = {
      <std::ios_base> = {
        _vptr.ios_base = 0xfbfcc0,
        static boolalpha = std::_S_boolalpha,
        static dec = std::_S_dec,
        static fixed = std::_S_fixed,
        static hex = std::_S_hex,
        static internal = std::_S_internal,
        static left = std::_S_left,
        static oct = std::_S_oct,
        static right = std::_S_right,
        static scientific = std::_S_scientific,
        static showbase = std::_S_showbase,
        static showpoint = std::_S_showpoint,
        static showpos = std::_S_showpos,
        static skipws = std::_S_skipws,
        static unitbuf = std::_S_unitbuf,
        static uppercase = std::_S_uppercase,
        static adjustfield = std::_S_adjustfield,
        static basefield = std::_S_basefield,
        static floatfield = std::_S_floatfield,
        static badbit = std::_S_badbit,
        static eofbit = std::_S_eofbit,
        static failbit = std::_S_failbit,
        static goodbit = std::_S_goodbit,
        static app = std::_S_app,
        static ate = std::_S_ate,
        static binary = std::_S_bin,
        static in = std::_S_in,
        static out = std::_S_out,
        static trunc = std::_S_trunc,
        static beg = std::_S_beg,
        static cur = std::_S_cur,
        static end = std::_S_end,
        _M_precision = 6,
        _M_width = 0,
        _M_flags = 4098,
        _M_exception = std::_S_goodbit,
        _M_streambuf_state = 5,
        _M_callbacks = 0x0,
        _M_word_zero = {
          _M_pword = 0x0,
          _M_iword = 0
        },
        _M_local_word = {{
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }, {
            _M_pword = 0x0,
            _M_iword = 0
          }},
        _M_word_size = 8,
        _M_word = 0x7fff4ce69f20,
        _M_ios_locale = {
          static none = 0,
          static ctype = 1,
          static numeric = 2,
          static collate = 4,
          static time = 8,
          static monetary = 16,
          static messages = 32,
          static all = 63,
          _M_impl = 0x7f942a6e3aa0,
          static _S_classic = 0x7f942a6e3aa0,
          static _S_global = 0x7f942a6e3aa0,
          static _S_categories = 0x7f942a6c86a0,
          static _S_once = 2
        }
      },
      members of std::basic_ios<char, std::char_traits<char> >:
      _M_tie = 0x0,
      _M_fill = 0 '\000',
      _M_fill_init = false,
      _M_streambuf = 0x7fff4ce69df0,
      _M_ctype = 0x7f942a6e3d20,
      _M_num_put = 0x7f942a6e4040,
      _M_num_get = 0x7f942a6e4030
    },
    members of std::basic_istream<char, std::char_traits<char> >:
    _vptr.basic_istream = 0xfbfc98,
    _M_gcount = 0
  },
  members of std::basic_ifstream<char, std::char_traits<char> >:
  _M_filebuf = warning: can't find linker symbol for virtual table for `std::basic_filebuf<char, std::char_traits<char> >' value
{
    <std::basic_streambuf<char, std::char_traits<char> >> = {
      _vptr.basic_streambuf = 0xfc0a70,
      _M_in_beg = 0x6306000 "\317\317\317\......320\320\320\320"...,
      _M_in_cur = 0x6307fff "",
      _M_in_end = 0x6307fff "",
      _M_out_beg = 0x0,
      _M_out_cur = 0x0,
      _M_out_end = 0x0,
      _M_buf_locale = {
        static none = 0,
        static ctype = 1,
        static numeric = 2,
        static collate = 4,
        static time = 8,
        static monetary = 16,
        static messages = 32,
        static all = 63,
        _M_impl = 0x7f942a6e3aa0,
        static _S_classic = 0x7f942a6e3aa0,
        static _S_global = 0x7f942a6e3aa0,
        static _S_categories = 0x7f942a6c86a0,
        static _S_once = 2
      }
    },
    members of std::basic_filebuf<char, std::char_traits<char> >:
    _M_lock = {
      __data = {
        __lock = 0,
        __count = 0,
        __owner = 0,
        __nusers = 0,
        __kind = 0,
        __spins = 0,
        __list = {
          __prev = 0x0,
          __next = 0x0
        }
      },
      __size = '\000' <repeats 39 times>,
      __align = 0
    },
    _M_file = {
      _M_cfile = 0x70186c0,
      _M_cfile_created = true
    },
    _M_mode = 12,
    _M_state_beg = {
      __count = 0,
      __value = {
        __wch = 0,
        __wchb = "\000\000\000"
      }
    },
    _M_state_cur = {
      __count = 0,
      __value = {
        __wch = 0,
        __wchb = "\000\000\000"
      }
    },
    _M_state_last = {
      __count = 0,
      __value = {
        __wch = 0,
        __wchb = "\000\000\000"
      }
    },
    _M_buf = 0x6306000 "\317\317\317\317\317\......320\320\320\320\320"...,
    _M_buf_size = 8192,
    _M_buf_allocated = true,
    _M_reading = true,
    _M_writing = false,
    _M_pback = 0 '\000',
    _M_pback_cur_save = 0x0,
    _M_pback_end_save = 0x0,
    _M_pback_init = false,
    _M_codecvt = 0x7f942a6e3f60,
    _M_ext_buf = 0x0,
    _M_ext_buf_size = 0,
    _M_ext_next = 0x0,
    _M_ext_end = 0x0
  }
}
(gdb)

Thank you, Ahmed.

Do not loop on eof .

while (ifs.read(buf.data(), buf.size())) {
  size_t read = ifs.gcount();
  if(read==0) break; // don't trust passing `0` to `process_data`:
  process_data(buf.data(), read);
  if (read<buf.size()) break; // if we finished, end.
}

Finding the end of input is best done by attempting io, and noticing something went wrong. In this case, we read, count how many bytes we read, and when we read 0 bytes or have read fewer bytes than we expected to read, we decide there isn't any more data to come.

We also end if any failbit has been set on ifs by the IO operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM