How to convert a byte vector into to a 32bit word vector

How do you convert a vector of 8bit values (bytes) to a vector of 32bit values (words)?

Why would you want to do such a thing? Well maybe you have a bytes stream that needs to be sent over a 32bit bus?

You could use the std::vector constructor to construct a new vector on the fly.

std::vector<uint8_t> bytes{ 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88 };
std::vector<uint32_t> words((uint32_t*)bytes.data(), (uint32_t*)(bytes.data() + bytes.size()*sizeof(uint8_t)));

This works at first glance, with the notable issues:
– This truncates any remaining bytes from the vector when the vector is not mod 4. Well at least it doesn’t crash!
– Using the ctor results in a vector with the reverse order. e.g. {0x88776655, 0x44332211}. This is not big endian or little endian its just completely wrong:

A better solution is to use a loop, iterate every 4 bytes and then bitshift(!) each byte by 24/16/8/0 bits into a new 32bit word. However, if the byte vector is not mod 4 then you get a segfault/exception. This is because incrementing the iterator by a value more than one can lead to undefined behaviour and ecshews the built-in safety of using vector.end() as a guard.

In the real word this is a problem, as a byte stream that is not mod 4 is a highly likely occurence.

So, if you want to convert a byte vector that is not mod 4 then you have to make a decision: pad with zero-value bytes or truncate back to the nearest bytes vector length that is mod 4. After all, when a 32bit word is the smallest possible unit for a result, there is no such thing as a byte.

Here is a function that pads/truncates and then converts:

// Note, we are passing `bytes` by value because we want a **copy** we can modify.
void bytes_to_words(std::vector<uint8_t> &bytes, std::vector<uint32_t> &words, bool zero_pad = true)
{
    int word_width = sizeof(uint32_t);
    int byte_mod = bytes.size() % word_width;
    if(zero_pad)
    {
        // pad the word with zero-value bytes
        if(byte_mod) bytes.insert(bytes.end(), word_width - byte_mod, 0x00);
    }
    else
    {
        // truncate the word vector down to the nearest whole word
        bytes = std::vector<uint8_t> {bytes.begin(), bytes.end() - byte_mod};
    }
    for (auto it = bytes.begin(); it != bytes.end(); it += word_width)
    {
        words.push_back(
              *(it)     << 24
            | *(it+1)   << 16
            | *(it+2)   << 8
            | *(it+3)
        );
    }
}

Input:

std::vector<uint8_t> bytes1{ 0x11, 0x22, 0x33, 0x44, 0x55};

The padded result:

0x11223344 0x55000000

The truncated result:

0x11223344

godbolt

Categories: C++

Leave a Reply

Your email address will not be published. Required fields are marked *