Because of their internal architecture, many microprocessors and microcontrollers (MCUs) can only access data that is aligned to the native word (or data bus) size. For example, a 32-bit MCU can load and store 32-bit words (4 bytes) from/to addresses that are multiple of 4. Attempting an unaligned data access can result in an exception that will halt program execution. In other cases there is no exception but the result of the load or store operation is not as expected.
Padding
To avoid unaligned data accesses, compilers are careful to place data in appropriate locations. This is not a big problem with arrays, where as long as the start address is aligned properly, all array elements are also aligned properly. It does get trickier, however, with data structures that typically mix different data types with 8-bit, 16-bit, 32-bit, and sometimes even 64-bit size. In this case the compiler will automatically insert padding bytes to make sure that each structure member is aligned at appropriate address that is a multiple of the member data size.
Packing
Padding inside data structures avoid unaligned data accesses but increases the size of the data structure. This is often important in embedded systems with fairly constrained memory resources measured in kilobytes (KB) — not megabytes (MB) or gigabytes (GB). To save precious memory, one can use packed structures by adding an appropriate compiler attribute to the structure definition.
A packed structure data type can also be used to access possibly unaligned data words. Here is a simple packed data structure containing one 32-bit data field. It can be used with typecasting to force the compiler to generate code for unaligned data accesses:
typedef struct { int32_t u; } __attribute__((packed)) unaligned; int32_t unaligned_load(void *ptr) { unaligned *uptr = (unaligned *)ptr; return uptr->u; }
With this approach there is no need to load the data byte by byte and then assemble the entire data word manually. We let the compiler handle it in the best way. For example, on MIPS processors the compiler will use a pair of LWL/LWR instructions: see the MIPS Instruction Set Quick Reference that I created some time ago when I worked at the MIPS Processor Architecture Group.