pub fn avx_transpose128x128(in_out: &mut [__m256i; 64])
avx2
Transpose 128x128 bit matrix using AVX2.
AVX2 needs to be enabled.