IDE Extension

Supercharge
SIMD Development
in your IDE

Simplify SIMD porting across SSE4.2, AVX, NEON, VSX, and more. Enjoy smart intrinsics highlighting, detailed latency/throughput insights, and in-editor intrinsic documentation.

Files
// compute square root of vector v1
__vector double sqrt_v1 = vec_sqrt(v1);

// extract element 2 from vector v1
double elem = vec_extract(v1, 2);

// take absolute value of vector v2
__vector double abs_v2 = vec_abs(v2);
// compute square root of vector v1
__vector double sqrt_v1 = vec_sqrt(v1);
vec_sqrt (VSX - IBM Power 9 64-bit)
Purpose: Returns a vector containing the square root of each element in the source vector.
Result value: Each element of output is the square root of the corresponding element of a.
Endian considerations: None.
Prototypes:
vector float result = vec_sqrt(vector float a);
vector double result = vec_sqrt(vector double a);

// extract element 2 from vector v1
double elem = vec_extract(v1, 2);
vec_extract (VSX - IBM Power 9 64-bit)
Purpose: Returns the value of the b-th element of vector a.
Result value: The element at position b modulo the number of elements.
Endian considerations: Big-endian: left-to-right, Little-endian: right-to-left.
Notes: Prior to ISA 3.0, less efficient sequences are used.
Prototypes:
signed char result = vec_extract(vector signed char a, int b);
unsigned char result = vec_extract(vector unsigned char a, int b);
signed short result = vec_extract(vector signed short a, int b);
unsigned short result = vec_extract(vector unsigned short a, int b);
int result = vec_extract(vector signed int a, int b);
unsigned int result = vec_extract(vector unsigned int a, int b);
long long result = vec_extract(vector signed long long a, int b);
unsigned long long result = vec_extract(vector unsigned long long a, int b);
float result = vec_extract(vector float a, int b);
double result = vec_extract(vector double a, int b);

// take absolute value of vector v2
__vector double abs_v2 = vec_abs(v2);
vec_abs (VSX - IBM Power 9 64-bit)
Purpose: Returns a vector containing absolute values of the elements in a.
Result value: Each element is the absolute value of the corresponding input element.
Notes: For integers, arithmetic is modular.
Prototypes:
vector signed int result = vec_abs(vector signed int a);
vector float result = vec_abs(vector float a);
vector double result = vec_abs(vector double a);
vector signed char result = vec_abs(vector signed char a);
vector signed short result = vec_abs(vector signed short a);
Intrinsics highlighting
// compute square root of vector v1
float32x4_t sqrt_v1 = vsqrtq_f32(v1);

// extract element 2 from vector v1
float32_t elem = vgetq_lane_f32(v1, 2);

// take absolute value of vector v2
float32x4_t abs_v2 = vabsq_f32(v2);
// compute square root of vector v1
float32x4_t sqrt_v1 = vsqrtq_f32(v1);
vsqrtq_f32 (Neon - Arm)
Purpose: Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
Result value: float32x4_t
Prototype:
float32x4_t result = vsqrtq_f32(float32x4_t a);

// extract element 2 from vector v1
float32_t elem = vgetq_lane_f32(v1, 2);
vgetq_lane_f32 (Neon - Arm)
Purpose: Extracts a single element from a vector at the specified lane position. The lane index must be an immediate value within the valid range for the vector size.
Result value: float32_t
Prototype:
float32_t result = vgetq_lane_f32(float32x4_t a, const int b);

// take absolute value of vector v2
float32x4_t abs_v2 = vabsq_f32(v2);
vabsq_f32 (Neon - Arm)
Purpose: Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Result value: float32x4_t
Prototype:
float32x4_t result = vabsq_f32(float32x4_t a);
Intrinsics highlighting
// compute square root of vector v1
__m128 sqrt_v1 = _mm_extract_ps(v1);

// extract element 2 from vector v1
__m128 elem = _mm_extract_ps(v2, 2);

// take absolute value of vector v2
__m128 abs_v2 = _mm_andnot_ps(_mm_set1_ps(-0.0f), v2);
// compute square root of vector v1
__m128 sqrt_v1 = _mm_sqrt_ps(v2);
sqrt_v1 (SSE4.2 - Intel)
Purpose: Compute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in output.
Result value: __m128
Prototype:
__m128 result = _mm_sqrt_ps(__m128 a);

// extract element 2 from vector v1
int elem = _mm_extract_ps(v2, 2);
_mm_extract_ps (SSE4.2 - Intel)
Purpose: Extract a single-precision (32-bit) floating-point element from a, selected with c, and store the result in output.
Result value: int
Prototype:
int result = _mm_extract_ps(__m128 a, const int b);

// take absolute value of vector v2
__m128 abs_v2 = _mm_andnot_ps(
_mm_andnot_ps (SSE4.2 - Intel)
Purpose: Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b, and store the results in output.
Result value: __m128
Prototype:
__m128 result = _mm_andnot_ps(__m128 a, __m128 b);
_mm_set1_ps(-0.0f)
_mm_set1_ps (SSE4.2 - Intel)
Purpose: Broadcast single-precision (32-bit) floating-point value a to all elements of dst.
Result value: __m128
Prototype:
__m128 result = _mm_set1_ps(float a);
, v2);
Intrinsics highlighting

Powerful Features

Everything you need for efficient SIMD development

Supported SIMD Architectures

Based on SIMD.ai subscription plans

Multi-architecture support

All major SIMD architectures supported

INTEL SSE4.2 INTEL AVX2 INTEL AVX512 ARM NEON POWER VSX IBM-Z

Coming Soon

Additional architectures in development

RVV 1.0 ARM SVE/SVE2 ARM SME2 LOONGSON LSX/LASX MIPS/MSA INTEL/AMX POWER/MMA